Gene Expression Profiling in Biopsied Tumor Tissues

ABSTRACT

The invention concerns sensitive methods to measure mRNA levels in biopsied tumor tissues, including archived paraffin-embedded biopsy material. The invention also concerns breast cancer gene sets important in the diagnosis and treatment of breast cancer, and methods for assigning the most optimal treatment options to breast cancer patient based upon knowledge derived from gene expression studies.

CROSS-REFERENCE

This application claims the benefit under 35 U.S.C. 119(h) ofprovisional application Ser. Nos. 60/412,049, filed Sep. 18, 2002 and60/364,890, filed Mar. 13, 2002, the entire disclosures which are herebyincorporated by reference.

BACKGROUND OF THE INVENTION Field of the Invention

The present invention relates to gene expression profiling in biopsiedtumor tissues. In particular, the present invention concerns sensitivemethods to measure mRNA levels in biopsied tumor tissues, includingarchived paraffin-embedded biopsy material. In addition, the inventionprovides a set of genes the expression of which is important in thediagnosis and treatment of breast cancer.

Oncologists have a number of treatment options available to them,including different combinations of chemotherapeutic drugs that arecharacterized as “standard of care,” and a number of drugs that do notcarry a label claim for a particular cancer, but for which there isevidence of efficacy in that cancer. Best likelihood of good treatmentoutcome requires that patients be assigned to optimal available cancertreatment, and that this assignment be made as quickly as possiblefollowing diagnosis.

Currently, diagnostic tests used in clinical practice are singleanalyte, and therefore do not capture the potential value of knowingrelationships between dozens of different markers. Moreover, diagnostictests are frequently not quantitative, relying on immunohistochemistry.This method often yields different results in different laboratories, inpart because the reagents are not standardized, and in part because theinterpretations are subjective and cannot be easily quantified.RNA-based tests have not often been used because of the problem of RNAdegradation over time and the fact that it is difficult to obtain freshtissue samples from patients for analysis. Fixed paraffin-embeddedtissue is more readily available and methods have been established todetect RNA in fixed tissue. However, these methods typically do notallow for the study of large numbers of genes (DNA or RNA) from smallamounts of material. Thus, traditionally fixed tissue has been rarelyused other than for immunohistochemistry detection of proteins.

Recently, several groups have published studies concerning theclassification of various cancer types by microarray gene expressionanalysis (see, e.g. Golub et al., Science 286:531-537 (1999);Bhattacharjae et al., Proc. Natl. Acad. Sci. USA 98:13790-13795 (2001);Chen-Hsiang et al., Bioinformatics 17 (Suppl. 1):S316-S322 (2001);Ramaswamy et al., Proc. Natl. Acad. Sci. USA 98:15149-15154 (2001)).Certain classifications of human breast cancers based on gene expressionpatterns have also been reported (Martin et al., Cancer Res.60:2232-2238 (2000); West et al., Proc. Natl. Acad. Sci. USA98:11462-11467 (2001); Sorlie et al., Proc. Natl. Acad. Sci. USA98:10869-10874 (2001); Yan et al., Cancer Res. 61:8375-8380 (2001)).However, these studies mostly focus on improving and refining thealready established classification of various types of cancer, includingbreast cancer, and generally do not provide new insights into therelationships of the differentially expressed genes, and do not link thefindings to treatment strategies in order to improve the clinicaloutcome of cancer therapy.

Although modern molecular biology and biochemistry have revealed morethan 100 genes whose activities influence the behavior of tumor cells,state of their differentiation, and their sensitivity or resistance tocertain therapeutic drugs, with a few exceptions, the status of thesegenes has not been exploited for the purpose of routinely makingclinical decisions about drug treatments. One notable exception is theuse of estrogen receptor (ER) protein expression in breast carcinomas toselect patients to treatment with anti-estrogen drugs, such astamoxifen. Another exceptional example is the use of ErbB2 (Her2)protein expression in breast carcinomas to select patients with the Her2antagonist drug Herceptin® (Genentech, Inc., South San Francisco,Calif.).

Despite recent advances, the challenge of cancer treatment remains totarget specific treatment regimens to pathogenically distinct tumortypes, and ultimately personalize tumor treatment in order to maximizeoutcome. Hence, a need exists for tests that simultaneously providepredictive information about patient responses to the variety oftreatment options. This is particularly true for breast cancer, thebiology of which is poorly understood. It is clear that theclassification of breast cancer into a few subgroups, such as ErbB2⁺subgroup, and subgroups characterized by low to absent gene expressionof the estrogen receptor (ER) and a few additional transcriptionalfactors (Perou et al. Nature 406:747-752 (2000)) does not reflect thecellular and molecular heterogeneity of breast cancer, and does notallow the design of treatment strategies maximizing patient response.

SUMMARY OF THE INVENTION

The present invention provides (1) sensitive methods to measure mRNAlevels in biopsied tumor tissue, (2) a set of approximately 190 genes,the expression of which is important in the diagnosis of breast cancer,and (3) the significance of abnormally low or high expression for thegenes identified and included in the gene set, through activation ordisruption of biochemical regulatory pathways that influence patientresponse to particular drugs used or potentially useful in the treatmentof breast cancer. These results permit assessment of genomic evidence ofthe efficacy of more than a dozen relevant drugs.

The present invention accommodates the use of archived paraffin-embeddedbiopsy material for assay of all markers in the set, and therefore iscompatible with the most widely available type of biopsy material. Theinvention presents an efficient method for extraction of RNA fromwax-embedded, fixed tissues, which reduces cost of mass productionprocess for acquisition of this information without sacrificing qualityof the analysis. In addition, the invention describes a novel highlyeffective method for amplifying mRNA copy number, which permitsincreased assay sensitivity and the ability to monitor expression oflarge numbers of different genes given the limited amounts of biopsymaterial. The invention also captures the predictive significance ofrelationships between expressions of certain markers in the breastcancer marker set. Finally, for each member of the gene set, theinvention specifies the oligonucleotide sequences to be used in thetest.

In one aspect, the invention concerns a method for predicting clinicaloutcome for a patient diagnosed with cancer, comprising

determining the expression level of one or more genes, or theirexpression products, selected from the group consisting of p53BP2,cathepsin B, cathepsin L, Ki67/MiB1, and thymidine kinase in a cancertissue obtained from the patient, normalized against a control gene orgenes, and compared to the amount found in a reference cancer tissueset,

wherein a poor outcome is predicted if:

(a) the expression level of p53BP2 is in the lower 10^(th) percentile;or

(b) the expression level of either cathepsin B or cathepsin L is in theupper 10^(th) percentile; or

(c) the expression level of any either Ki67/MiB1 or thymidine kinase isin the upper 10^(th) percentile.

Poor clinical outcome can be measured, for example, in terms ofshortened survival or increased risk of cancer recurrence, e.g.following surgical removal of the cancer.

In another embodiment, the inventor concerns a method of predicting thelikelihood of the recurrence of cancer, following treatment, in a cancerpatient, comprising determining the expression level of p27, or itsexpression product, in a cancer tissue obtained from the patient,normalized against a control gene or genes, and compared to the amountfound in a reference cancer tissue set, wherein an expression level inthe upper 10th percentile indicates decreased risk of recurrencefollowing treatment.

In another aspect, the invention concerns a method for classifyingcancer comprising, determining the expression level of two or more genesselected from the group consisting of Bcl2, hepatocyte nuclear factor 3,ER, ErbB2, and Grb7, or their expression products, in a cancer tissue,normalized against a control gene or genes, and compared to the amountfound in a reference cancer tissue set, wherein (i) tumors expressing atleast one of Bcl2, hepatocyte nuclear factor 3, and ER, or theirexpression products, above the mean expression level in the referencetissue set are classified as having a good prognosis for disease freeand overall patient survival following treatment; and (ii) tumorsexpressing elevated levels of ErbB2 and Grb7, or their expressionproducts, at levels ten-fold or more above the mean expression level inthe reference tissue set are classified as having poor prognosis ofdisease free and overall patient survival following treatment.

All types of cancer are included, such as, for example, breast cancer,colon cancer, lung cancer, prostate cancer, hepatocellular cancer,gastric cancer, pancreatic cancer, cervical cancer, ovarian cancer,liver cancer, bladder cancer, cancer of the urinary tract, thyroidcancer, renal cancer, carcinoma, melanoma, and brain cancer. Theforegoing methods are particularly suitable for prognosis/classificationof breast cancer.

In all previous aspects, in a specific embodiment, the expression levelis determined using RNA obtained from a formalin-fixed,paraffin-embedded tissue sample. While all techniques of gene expressionprofiling, as well as proteomics techniques, are suitable for use inperforming the foregoing aspects of the invention, the gene expressionlevels are often determined by reverse transcription polymerase chainreaction (RT-PCR).

If the source of the tissue is a formalin-fixed, paraffin embeddedtissue sample, the RNA is often fragmented.

The expression data can be further subjected to multivariate analysis,for example using the Cox Proportional Hazards model.

In a further aspect, the invention concerns a method for the preparationof nucleic acid from a fixed, wax-embedded tissue specimen, comprising:

(a) incubating a section of the fixed, wax-embedded tissue specimen at atemperature of about 56° C. to 70° C. in a lysis buffer, in the presenceof a protease, without prior dewaxing, to form a lysis solution;

(b) cooling the lysis solution to a temperature where the waxsolidifies; and

(c) isolating the nucleic acid from the lysis solution.

The lysis buffer may comprise urea, such as 4M urea. In a particularembodiment, incubation in step (a) of the foregoing method is performedat about 65° C.

In another particular embodiment, the protease used in the foregoingmethod is proteinase K.

In another embodiment, the cooling in step (b) is performed at roomtemperature.

In a further embodiment, the nucleic acid is isolated after proteinremoval with 2.5 M NH₄OAc.

The nucleic acid can, for example, be total nucleic acid present in thefixed, wax-embedded tissue specimen.

In yet another embodiment, the total nucleic acid is isolated byprecipitation from the lysis solution, following protein removal, with2.5 M NH₄OAc. The precipitation may, for example, be performed withisopropanol.

The method described above may further comprise the step of removing DNAfrom the total nucleic acid, for example by DNAse treatment.

The tissue specimen may, for example, be obtained from a tumor, and theRNA may be obtained from a microdissected portion of the tissue specimenenriched for tumor cells.

All types of tumor are included, such as, without limitation, breastcancer, colon cancer, lung cancer, prostate cancer, hepatocellularcancer, gastric cancer, pancreatic cancer, cervical cancer, ovariancancer, liver cancer, bladder cancer, cancer of the urinary tract,thyroid cancer, renal cancer, carcinoma, melanoma, and brain cancer, inparticular breast cancer.

The method described above may further comprise the step of subjectingthe RNA to gene expression profiling. Thus, the gene expression profilemay be completed for a set of genes comprising at least two of the geneslisted in Table 1.

Although all methods of gene expression profiling are contemplated, in aparticular embodiment, gene expression profiling is performed by RT-PCRwhich may be preceded by an amplification step.

In another aspect, the invention concerns a method for preparingfragmented RNA for gene expression analysis, comprising the steps of:

(a) mixing the RNA with at least one gene-specific, single-stranded DNAscaffold under conditions such that fragments of the RNA complementaryto the DNA scaffold hybridize with the DNA scaffold;

(b) extending the hybridized RNA fragments with a DNA polymerase to forma DNA-DNA duplex; and

(c) removing the DNA scaffold from the duplex.

In a specific embodiment, in step (b) of this method, the RNA may bemixed with a mixture of single-stranded DNA templates specific for eachgene of interest.

The method can further comprise the step of heat-denaturing andreannealing the duplexed DNA to the DNA scaffold, with or withoutadditional overlapping scaffolds, and further extending the duplexedsense strand with DNA polymerase prior to removal of the scaffold instep (c).

The DNA templates may be, but do not need to be, fully complementary tothe gene of interest.

In a particular embodiment, at least one of the DNA templates iscomplementary to a specific segment of the gene of interest.

In another embodiment, the DNA templates include sequences complementaryto polymorphic variants of the same gene.

The DNA template may include one or more dUTP or rNTP sites. In thiscase. in step (c) the DNA template may be removed by fragmenting the DNAtemplate present in the DNA-DNA duplex formed in step (b) at the dUTP orrNTP sites.

In an important embodiment, the RNA is extracted from fixed,wax-embedded tissue specimens, and purified sufficiently to act as asubstrate in an enzyme assay. The RNA purification may, but does notneed to, include an oligo-dT based step.

In a further aspect, the invention concerns a method for amplifying RNAfragments in a sample comprising fragmented RNA representing at leastone gene of interest, comprising the steps of:

(a) contacting the sample with a pool of single-stranded DNA scaffoldscomprising an RNA polymerase promoter at the 5′ end under conditionssuch that the RNA fragments complementary to the DNA scaffolds hybridizewith the DNA scaffolds;

(b) extending the hybridized RNA fragments with a DNA polymerase alongthe DNA scaffolds to form DNA-DNA duplexes;

(c) amplifying the gene or genes of interest by in vitro transcription;and

(d) removing the DNA scaffolds from the duplexes.

An exemplary promoter is the T7 RNA polymerase promoter, while anexemplary DNA polymerase is DNA polymerase I.

In step (d) the DNA scaffolds may be removed, for example, by treatmentwith DNase I.

In a further embodiment, the pool of single-stranded DNA scaffoldscomprises partial or complete gene sequences of interest, such as alibrary of cDNA clones.

In a specific embodiment, the sample represents a whole genome or afraction thereof. In a preferred embodiment, the genome is the humangenome.

In another aspect, the invention concerns a method of preparing apersonalized genomics profile for a patient, comprising the steps of:

(a) subjecting RNA extracted from a tissue obtained from the patient togene expression analysis;

(b) determining the expression level in such tissue of at least twogenes selected from the gene set listed in Table 1, wherein theexpression level is normalized against a control gene or genes, and iscompared to the amount found in a cancer tissue reference set;

(c) and creating a report summarizing the data obtained by the geneexpression analysis.

The tissue obtained from the patient may, but does not have to, comprisecancer cells. Just as before, the cancer can, for example, be breastcancer, colon cancer, lung cancer, prostate cancer, hepatocellularcancer, gastric cancer, pancreatic cancer, cervical cancer, ovariancancer, liver cancer, bladder cancer, cancer of the urinary tract,thyroid cancer, renal cancer, carcinoma, melanoma, or brain cancer,breast cancer being particularly preferred.

In a particular embodiment, the RNA is obtained from a microdissectedportion of breast cancer tissue enriched for cancer cells. The controlgene set may, for example, comprise S-actin, and ribosomal protein LPO.

The report prepared for the use of the patient or the patient'sphysician, may include the identification of at least one drugpotentially beneficial in the treatment of the patient.

Step (b) of the foregoing method may comprise the step of determiningthe expression level of a gene specifically influencing cellularsensitivity to a drug, where the gene can, for example, be selected fromthe group consisting of aldehyde dehydrogenase 1A1, aldehydedehydrogenase 1A3, amphiregulin, ARG, BRK, BCRP, CD9, CD31, CD82/KAI-1,COX2, c-abl, c-kit, c-kit L, CYP1B1, CYP2C9, DHFR, dihydropyrimidinedehydrogenase, EGF, epiregulin, ER-alpha, ErbB-1, ErbB-2, ErbB-3,ErbB-4, ER-beta, farnesyl pyrophosphate synthetase, gamma-GCS (glutamylcysteine synthetase), GATA3, geranyl geranyl pyrophosphate synthetase,Grb7, GST-alpha, GST-pi, HB-EGF, hsp 27, human chorionicgonadotropin/CGA, IGF-1, IGF-2, IGF1R, KDR, LIV1, Lung ResistanceProtein/MVP, Lot1, MDR-1, microsomal epoxide hydrolase, MMP9, MRP1,MRP2, MRP3, MRP4, PAI1, PDGF-A, PDGF-B, PDGF-C, PDGF-D, PGDFR-alpha,PDGFR-beta, PLAGa (pleiomorphic adenoma 1), PREP prolyl endopeptidase,progesterone receptor, pS2/trefoil factor 1, PTEN, PTB1b, RAR-alpha,RAR-beta2, Reduced Folate Carrier, SXR, TGF-alpha, thymidinephosphorylase, thymidine synthase, topoisomerase II-alpha, topoisomeraseII-beta, VEGF, XIST, and YB-1.

In another embodiment, step (b) of the foregoing process includesdetermining the expression level of multidrug resistance factors, suchas, for example, gamma-glutamyl-cysteine synthetase (GCS), GST-α, GST-π,MDR-1, MRP1-4, breast cancer resistance protein (BCRP), lung cancerresistance protein (MVP), SXR, or YB-1.

In another embodiment, step (b) of the foregoing process comprisesdetermination of the expression level of eukaryotic translationinitiation factor 4E (EIF4E).

In yet another embodiment, step (b) of the foregoing process comprisesdetermination of the expression level of a DNA repair enzyme.

In a further embodiment, step (b) of the foregoing process comprisesdetermination of the expression level of a cell cycle regulator, suchas, for example, c-MYC, c-Src, Cyclin D1, Ha-Ras, mdm2. p14ARF,p21WAF1/CI, p16INK4a/p14, p23, p27, p53, PI3K, PKC-epsilon, orPKC-delta.

In a still further embodiment, step (b) of the foregoing processcomprises determination of the expression level of a tumor suppressor ora related protein, such as, for example, APC or E-cadherin.

In another embodiment, step (b) of the foregoing method comprisesdetermination of the expression level of a gene regulating apoptosis,such as, for example, p53, BC12, Bcl-x 1, Bak, Bax, and related factors,NFκ-B, CIAP1, CIAP2, survivin, and related factors, p53BP1/ASPP1, orp53BP2/ASPP2.

In yet another embodiment, step (b) of the foregoing process comprisesdetermination of the expression level of a factor that controls cellinvasion or angiogenesis, such as, for example, uPA, PAI1, cathepsin B,C, and L, scatter factor (HGF), c-met, KDR, VEGF, or CD31.

In a different embodiment, step (b) of the foregoing method comprisesdetermination of the expression level of a marker for immune orinflammatory cells or processes, such as, for example, Ig light chain λ,CD18, CD3, CD68. Fas (CD95), or Fas Ligand.

In a further embodiment, step (b) of the foregoing process comprisesdetermination of the expression level of a cell proliferation marker,such as, for example, Ki67/MiB1, PCNA, Pin1, or thymidine kinase.

In a still further embodiment, step (b) of the foregoing processcomprises determination of the expression level of a growth factor orgrowth factor receptor, such as, for example, IGF1, IGF2, IGFBP3, IGF1R,FGF2, CSF-1, CSF-1R/fms, SCF-1, IL6 or IL8.

In another embodiment, step (b) of the foregoing process comprisesdetermination of the expression level of a gene marker that defines asubclass of breast cancer, where the gene marker can, for example, beGRO1 oncogene alpha, Grb7, cytokeratins 5 and 17, retinol bindingprotein 4, hepatocyte nuclear factor 3, integrin subunit alpha 7, orlipoprotein lipase.

In a still further aspect, the invention concerns a method forpredicting the response of a patient diagnosed with breast cancer to5-fluorouracil (5-FU) or an analog thereof, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis;

(b) determining the expression level in the tissue of thymidylatesynthase mRNA, wherein the expression level is normalized against acontrol gene or genes, and is compared to the amount found in areference breast cancer tissue set; and

(c) predicting patient response based on the normalized thymidylatesynthase mRNA level.

Step (d) of the foregoing method can further comprise determining theexpression level of dihydropyrimidine phosphorylase.

In another embodiment, step (b) of the method can further comprisedetermining the expression level of thymidine phosphorylase.

In yet another embodiment, a positive response to 5-FU or an analogthereof is predicted if: (i) normalized thymidylate synthase mRNA leveldetermined in step (b) is at or below the 15^(th) percentile; or (ii)the sum of normalized expression levels of thymidylate synthase anddihydropyrimidine phosphorylase determined in step (b) is at or belowthe 25^(th) percentile; or (iii) the sum of normalized expression levelsof thymidylate synthase, dihydropyrimidine phosphorylase, plus thymidinephosphorylase determined in step (b) is at or below the 20^(th)percentile.

In a further embodiment, in step (b) of the foregoing method theexpression level of c-myc and wild-type p53 is determined. In this case,a positive response to 5-FU or an analog thereof is predicted, if thenormalized expression level of c-myc relative to the normalizedexpression level of wild-type p53 is in the upper 15^(th) percentile.

In a still further embodiment, in step (b) of the foregoing method,expression level of NFκB and cIAP2 is determined. In this particularembodiment, resistance to 5-FU or an analog thereof is typicallypredicted if the normalized expression level of NFκB and cIAP2 is at orabove the 10^(th) percentile.

In another aspect, the invention concerns a method for predicting theresponse of a patient diagnosed with breast cancer to methotrexate or ananalog thereof, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting decreased patient sensitivity to methotrexate or analogif (i) DHFR levels are more than tenfold higher than the averageexpression level of DHFR in the control gene set, or (ii) the normalizedexpression levels of members of the reduced folate carver (RFC) familyare below the 10^(th) percentile.

In yet another aspect, the invention concerns a method for predictingthe response of a patient diagnosed with breast cancer to ananthracycline or an analog thereof, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting patient resistance or decreased sensitivity to theanthracycline or analog if (i) the normalized expression level oftopoisomerase IIα is below the 10^(th) percentile, or (ii) thenormalized expression level of topoisomerase IIβ is below the 10^(th)percentile, or (iii) the combined normalized topoisomerase IIα or IIβexpression levels are below the 10^(th) percentile.

In a different aspect, the invention concerns a method for predictingthe response of a patient diagnosed with breast cancer to a docetaxol,Comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting reduced sensitivity to docetaxol if the normalizedexpression level of CYP1B1 is in the upper 10^(th) percentile.

The invention further concerns a method for predicting the response of apatient diagnosed with breast cancer to cyclophosphamide or an analogthereof, comprising

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting reduced sensitivity to the cyclophosphamide or analog ifthe sum of the expression levels of aldehyde dehydrogenase 1A1 and 1A3is more than tenfold higher than the average of their combinedexpression levels in the reference tissue set.

In a further aspect, the invention concerns a method for predicting theresponse of a patient diagnosed with breast cancer to anti-estrogentherapy, comprising

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set that contains bothspecimens negative for and positive for estrogen receptor-α (ERα) andprogesterone receptor-α (PRα); and

(b) predicting patient response based upon the normalized expressionlevels of ERα or PRα, and at least one of microsomal epoxide hydrolase,pS2/trefoil factor 1, GATA3 and human chorionic gonadotropin.

In a specific embodiment, lack of response or decreased responsivenessis predicted if (i) the normalized expression level of microsomalepoxide hydrolase is in the upper 10^(th) percentile; or (ii) thenormalized expression level of pS2/trefoil factor 1, or GATA3 or humanchorionic gonaostropin is at or below the corresponding averageexpression level in said breast cancer tissue set, regardless of theexpression level of ERα or PRα in the breast cancer tissue obtained fromthe patient.

In another aspect, the invention concerns a method for predicting theresponse of a patient diagnosed with breast cancer to a taxane,comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting reduced sensitivity to taxane if (i) no or minimal XISTexpression is detected; or (ii) the normalized expression level of GST-πor propyl endopeptidase (PREP) is in the upper 10^(th) percentile; or(iii) the normalized expression level of PLAG1 is in the upper 10^(th)percentile.

The invention also concerns a method for predicting the response of apatient diagnosed with breast cancer to cisplatin or an analog thereof,comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found, in a reference breast cancer tissue set; and

(b) predicting resistance or reduced sensitivity if the normalizedexpression level of ERCC1 is in the upper 10^(th) percentile.

The invention further concerns a method for predicting the response of apatient diagnosed with breast cancer to an ErbB2 or EGFR antagonist,comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting patient response based on the normalized expressionlevels of at least one of Grb7, IGF1R, IGF1 and IGF2.

In particular embodiment, a positive response is predicted if thenormalized expression level of Grb7 is in the upper 10^(th) percentile,and the expression of IGF1R, IGF1 and IGF2 is not elevated above the90^(th) percentile.

In a further particular embodiment, a decreased responsiveness ispredicted if the expression level of at least one of IGF1R, IGF1 andIGF2 is elevated.

In another aspect, the invention concerns a method for predicting theresponse of a patient diagnosed with breast cancer to a bis-phosphonatedrug, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting a positive response if the breast cancer tissue obtainedfrom the patient expresses mutant Ha-Ras and additionally expressesfarnesyl pyrophosphate synthetase or geranyl pyrophosphone synthetase ata normalized expression level at or above the 90^(th) percentile.

In yet another aspect, the invention concerns a method for predictingthe response of a patient diagnosed with breast cancer to treatment witha cyclooxygenase 2 inhibitor, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting a positive response if the normalized expression level ofCOX2 in the breast cancer tissue obtained from the patient is at orabove the 90^(th) percentile.

The invention further concerns a method for predicting the response of apatient diagnosed with breast cancer to an EGF receptor (EGFR)antagonist, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting a positive response to an EGFR antagonist, if (i) thenormalized expression level of EGFR is at or above the 10^(th)percentile, and (ii) the normalized expression level of at least one ofepiregulin, TGF-α, amphiregulin, ErbB3, BRK, CD9, MMP9, CD82, and Lot1is above the 90^(th) percentile.

In another aspect, the invention concerns a method for monitoring theresponse of a patient diagnosed with breast cancer to treatment with anEGFR antagonist, comprising monitoring the expression level of a geneselected from the group consisting of epiregulin, TGF-α, amphiregulin,ErbB3, BRK, CD9, MMP9, CD82, and Lot1 in the patient during treatment,wherein reduction in the expression level is indicative of positiveresponse to such treatment.

In yet another aspect, the invention concerns a method for predictingthe response of a patient diagnosed with breast cancer to a drugtargeting a tyrosine kinase selected from the group consisting of abl,c-kit, PDGFR-α, PDGFR-β and ARG, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set;

(b) determining the normalized expression level of a tyrosine kinaseselected from the group consisting of abl, c-kit, PDGFR-α, PDGFR-β andARG, and the cognate ligand of the tyrosine kinase, and if thenormalized expression level of the tyrosine kinase is in the upper10^(th) percentile,

(c) determining whether the sequence of the tyrosine kinase contains anymutation,

wherein a positive response is predicted if (i) the normalizedexpression level of the tyrosine kinase is in the upper 10^(th)percentile, (ii) the sequence of the tyrosine kinase contains anactivating mutation, or (iii) the normalized expression level of thetyrosine kinase is normal and the expression level of the ligand is inthe upper 10^(th) percentile.

Another aspect of the invention is a method for predicting the responseof a patient diagnosed with breast cancer to treatment with ananti-angiogenic drug, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) predicting a positive response if (i) the normalized expressionlevel of VEGF is in the upper 10^(th) percentile and (ii) the normalizedexpression level of KDR or CD31 is in the upper 20^(th) percentile.

A further aspect of the invention is a method for predicting thelikelihood that a patient diagnosed with breast cancer developsresistance to a drug interacting with the MRP-1 gene coding for themultidrug resistance protein P-glycoprotein, comprising the steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis to determine the expressionlevel of PTP1b, wherein the expression level is normalized against acontrol gene or genes, and compared to the amount found in a referencebreast cancer tissue set; and

(b) concluding that the patient is likely to develop resistance to saiddrug if the normalized expression level of the MRP-1 gene is above the90^(th) percentile.

The invention further relates to a method for predicting the likelihoodthat a patient diagnosed with breast cancer develops resistance to achemotherapeutic drug or toxin used in cancer treatment, comprising thesteps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) determining the normalized expression levels of at least one of thefollowing genes: MDR1, SGTα, GST-π, SXR, BCRP YB-1, and LRP/MVP, whereinthe finding of a normalized expression level in the upper 4^(th)percentile is an indication that the patient is likely to developresistance to the drug.

Also included herein is a method for measuring the translationalefficiency of VEGF mRNA in a breast cancer tissue sample, comprisingdetermining the expression levels of the VEGF and EIF4E mRNA in thesample, normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set, wherein a highernormalized EIF4E expression level for the same VEGF expression level isindicative of relatively higher translational efficiency for VEGF.

In another aspect, the invention provides a method for predicting theresponse of a patient diagnosed with breast cancer to a VEGF antagonist,comprising determining the expression level of VEGF and EIF4E mRNAnormalized against a control gene or genes, and compared to the amountfound in a reference breast cancer tissue set, wherein a VEGF expressionlevel above the 90^(th) percentile and an EIF4E expression level abovethe 50^(th) percentile is a predictor of good patient response.

The invention further provides a method for predicting the likelihood ofthe recurrence of breast cancer in a patient diagnosed with breastcancer, comprising determining the ratio of p53:p21 mRNA expression orp53:mdm2 mRNA expression in a breast cancer tissue obtained from thepatient, normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set, wherein an abovenormal ratio is indicative of a higher risk of recurrence. Typically, ahigher risk of recurrence is indicated if the ratio is in the upper10^(th) percentile.

In yet another aspect, the invention concerns a method for predictingthe likelihood of the recurrence of breast cancer in a breast cancerpatient following surgery, comprising determining the expression levelof cyclin D1 in a breast cancer tissue obtained from the patient,normalized against a control gene or genes, and compared to the amountfound in a reference breast cancer tissue set, wherein an expressionlevel in the upper 10^(th) percentile indicates increased risk ofrecurrence following surgery. In a particular embodiment of this method,the patient is subjected to adjuvant chemotherapy, if the expressionlevel is in the upper 10^(th) percentile.

Another aspect of the invention is a method for predicting thelikelihood of the recurrence of breast cancer in a breast cancer patientfollowing surgery, comprising determining the expression level of APC orE-cadherin in a breast cancer tissue obtained from the patient,normalized against a control gene or genes, and compared to the amountfound in a reference breast cancer tissue set, wherein an expressionlevel in the upper 5^(th) percentile indicates high risk of recurrencefollowing surgery, and heightened risk of shortened survival.

A further aspect of the invention is a method for predicting theresponse of a patient diagnosed with breast cancer to treatment with aproapoptotic drug comprising determining the expression levels of BC12and c-MYC in a breast cancer tissue obtained from the patient,normalized against a control gene or genes, and compared to the amountfound in a reference breast cancer tissue set, wherein (i) a BC12expression level in the upper 10^(th) percentile in the absence ofelevated expression of c-MYC indicates good response, and (ii) a goodresponse is not indicated if the expression level c-MYC is elevated,regardless of the expression level of BC12.

A still further aspect of the invention is a method for predictingtreatment outcome for a patient diagnosed with breast cancer, comprisingthe steps of:

(a) subjecting RNA extracted from a breast cancer tissue obtained fromthe patient to gene expression analysis, wherein gene expression levelsare normalized against a control gene or genes, and compared to theamount found in a reference breast cancer tissue set; and

(b) determining the normalized expression levels of NFκB and at leastone gene selected from the group consisting of cIAP1, cIAP2, XIAP, andSurvivin,

wherein a poor prognosis is indicated if the expression levels for NFκBand at least one of the genes selected from the group consisting ofcIAP1, cIAP2, XIAP, and Survivin is in the upper 5^(th) percentile.

The invention further concerns a method for predicting treatment outcomefor a patient diagnosed with breast cancer, comprising determining theexpression levels of p53BP1 and p53BP2 in a breast cancer tissueobtained from the patient, normalized against a control gene or genes,and compared to the amount found in a reference breast cancer tissueset, wherein a poor outcome is predicted if the expression level ofeither p53BP1 or p53BP2 is in the lower 10^(th) percentile.

The invention additionally concerns a method for predicting treatmentoutcome for a patient diagnosed with breast cancer, comprisingdetermining the expression levels of uPA and PAI1 in a breast cancertissue obtained from the patient, normalized against a control gene orgenes, and compared to the amount found in a reference breast cancertissue set, wherein (i) a poor outcome is predicted if the expressionlevels of uPA and PAI1 are in the upper 20^(th) percentile, and (ii) adecreased risk of recurrence is predicted if the expression levels ofuPA and PAI1 are not elevated above the mean observed in the breastcancer reference set. In a particular embodiment, poor outcome ismeasured in terms of shortened survival or increased risk of cancerrecurrence following surgery. In another particular embodiment, uPA andPAI1 are expressed at normal levels, and the patient is subjected toadjuvant chemotherapy following surgery.

Another aspect of the invention is a method for predicting treatmentoutcome in a patient diagnosed with breast cancer, comprisingdetermining the expression levels of cathepsin B and cathepsin L in abreast cancer tissue obtained from the patient, normalized against acontrol gene or genes, and compared to the amount found in a referencebreast cancer tissue set, wherein a poor outcome is predicted if theexpression level of either cathepsin B or cathepsin L is in the upper10^(th) percentile. Just as before, poor treatment outcome may bemeasured, for example, in terms of shortened survival or increased riskof cancer recurrence.

A further aspect of the invention is a method for devising the treatmentof a patient diagnosed with breast cancer, comprising the steps of

(a) determining the expression levels of scatter factor and c-met in abreast cancer tissue obtained from the patient, normalized against acontrol gene or genes, and compared to the amount found in a referencebreast cancer tissue set, and

(b) suggesting prompt aggressive chemotherapeutic treatment if theexpression levels of scatter factor and c-met or the combination ofboth, are above the 90^(th) percentile.

A still further aspect of the invention is a method for predictingtreatment outcome for a patient diagnosed with breast cancer, comprisingdetermining the expression levels of VEGF, CD31, and KDR in a breastcancer tissue obtained from the patient, normalized against a controlgene or genes, and compared to the amount found in a reference breastcancer tissue set, wherein a poor treatment outcome is predicted if theexpression level of any of VEGF, CD31, and KDR is in the upper 10^(th)percentile.

Yet another aspect of the invention is a method for predicting treatmentoutcome for a patient diagnosed with breast cancer, comprisingdetermining the expression levels of Ki67/MiB1, PCNA, Pin 1, andthymidine kinase in a breast cancer tissue obtained from the patient,normalized against a control gene or genes, and compared to the amountfound in a reference breast cancer tissue set, wherein a poor treatmentoutcome is predicted if the expression level of any of Ki67/MiB1, PCNA,Pin1, and thymidine kinase is in the upper 10^(th) percentile.

The invention further concerns a method for predicting treatment outcomefor a patient diagnosed with breast cancer, comprising determining theexpression level of soluble and full length CD95 in a breast cancertissue obtained from the patient, normalized against a control gene orgenes, and compared to the amount found in a reference breast cancertissue set, wherein the presence of soluble CD95 correlates with poorpatient survival.

The invention also concerns a method for predicting treatment outcomefor a patient diagnosed with breast cancer, comprising determining theexpression levels of IGF1, IGF1R and IGFBP3 in a breast cancer tissueobtained from the patient, normalized against a control gene or genes,and compared to the amount found in a reference breast cancer tissueset, wherein a poor treatment outcome is predicted if the sum of theexpression levels of IGF1, IGF1R and IGFBP3 is in the upper 10^(th)percentile.

The invention additionally concerns a method for classifying breastcancer comprising, determining the expression level of two or more genesselected from the group consisting of Bcl12, hepatocyte nuclear factor3, LIV1, ER, lipoprotein lipase, retinol binding protein 4, integrin α7,cytokeratin 5, cytokeratin 17, GRO oncogen, ErbB2 and Grb7, in a breastcancer tissue, normalized against a control gene or genes, and comparedto the amount found in a reference breast cancer tissue set, wherein (i)tumors expressing at least one of Bcl1, hepatocyte nuclear factor 3,LIV1, and ER above the mean expression level in the reference tissue setare classified as having a good prognosis for disease free and overallpatient survival following surgical removal; (ii) tumors characterizedby elevated expression of at least one of lipoprotein lipase, retinolbinding protein 4, integrin α7 compared to the reference tissue set areclassified as having intermediate prognosis of disease free and overallpatient survival following surgical removal; and (iii) tumors expressingeither elevated levels of cytokeratins 5 and 17, and GRO oncogen atlevels four-fold or greater above the mean expression level in thereference tissue set, or ErbB2 and Grb7 at levels ten-fold or more abovethe mean expression level in the reference tissue set are classified ashaving poor prognosis of disease free and overall patient survivalfollowing surgical removal.

Another aspect of the invention is a panel of two or more gene specificprimers selected from the group consisting of the forward and reverseprimers listed in Table 2.

Yet another aspect of the invention is a method for reversetranscription of a fragmented RNA population in RT-PCR amplification,comprising using a multiplicity of gene specific primers as the reverseprimers in the amplification reaction. In a particular embodiment, themethod uses between two and about 40,000 gene specific primers in thesame amplification reaction. In another embodiment, the gene specificprimers are about 18 to 24 bases, such as about 20 bases in length. Inanother embodiment, the Tm of the primers is about 58-60° C. The primerscan, for example, be selected from the group consisting of the forwardand reverse primers listed in Table 2.

The invention also concerns a method of reverse transcriptase drivenfirst strand cDNA synthesis, comprising using a gene specific primer ofabout 18 to 24 bases in length and having a Tm optimum between about 58°C. and about 60° C. In a particular embodiment, the first strand cDNAsynthesis is followed by PCR DNA amplification, and the primer serves asthe reverse primer that drives the PCR amplification. In anotherembodiment, the method uses a plurality of gene specific primers in thesame first strand cDNA synthesis reaction mixture. The number of thegene specific primers can, for example, be between 2 and about 40,000.

In a different aspect, the invention concerns a method of predicting thelikelihood of long-term survival of a breast cancer patient without therecurrence of breast cancer, following surgical removal of the primarytumor, comprising determining the expression level of one or moreprognostic RNA transcripts or their product in a breast cancer tissuesample obtained from said patient, normalized against the expressionlevel of all RNA transcripts or their products in said breast cancertissue sample, or of a reference set of RNA transcripts or theirproducts, wherein the prognostic transcript is the transcript of one ormore genes selected from the group consisting of: FOXM1, PRAME, Bcl2,STK15, CEGP1, Ki-67, GSTM1, CA9, PR, BBC3, NME1, SURV, GATA3, TFRC,YB-1, DPYD, GSTM3, RPS6 KB1, Src, Chk1, ID1, EstR1, p27, CCNB1, XIAP,Chk2, CDC25B, IGF1R, AK055699, PI3KC2A, TGFB3, BAGI1, CYP3A4, EpCAM,VEGFC, pS2, hENT1, WISP1, HNF3A, NFKBp65, BRCA2, EGFR, TK1, VDR,Contig51037, pENT1, EPHX1, IF1A, DIABLO, CDH1, HIF1α, IGFBP3, CTSB, andHer2, wherein overexpression of one or more of FOXM1, PRAME, STK15,Ki-67, CA9, NME1, SURV, TFRC, YB-1, RPS6 KB1, Src, Chk1, CCNB1, Chk2,CDC25B, CYP3A4, EpCAM, VEGFC, hENT1, BRCA2, EGFR, TK1, VDR, EPHX1, IF1A,Contig51037, CDH1, HIF1α, IGFBP3, CTSB, Her2, and pENT1 indicates adecreased likelihood of long-term survival without breast cancerrecurrence, and the overexpression of one or more of Bcl2, CEGP1, GSTM1,PR, BBC3, GATA3, DPYD, GSTM3, ID1, EstR1, p27, XIAP, IGF1R, AK055699,P13KC2A, TGFB3, BAGI1, pS2, WISP1, HNF3A, NFKBp65, and DIABLO indicatesan increased likelihood of long-term survival without breast cancerrecurrence.

In a particular embodiment of this method, the expression level of atleast 2, preferably at least 5, more preferably at least 10, mostpreferably at least 15 prognostic transcripts or their expressionproducts is determined.

When the breast cancer is invasive breast carcinoma, including bothestrogen receptor (ER) overexpressing (ER positive) and ER negativetumors, the analysis includes determination of the expression levels ofthe transcripts of at least two of the following genes, or theirexpression products: FOXM1, PRAME, Bcl2, STK15, CEGP1, Ki-67, GSTM1, PR,BBC3, NME1, SURV, GATA3, TFRC, YB-1, DPYD, Src, CA9, Contig51037, RPS6K1and Her2.

When the breast cancer is ER positive invasive breast carcinoma, theanalysis includes determination of the expression levels of thetranscripts of at least two of the following genes, or their expressionproducts: PRAME, Bcl2, FOXM1, DIABLO, EPHX1, HIF1A, VEGFC, Ki-67, IGF1R,VDR, NME1, GSTM3, Contig51037, CDC25B, CTSB, p27, CDH1, and IGFBP3.

Just as before, it is preferred to determine the expression levels of atleast 5, more preferably at least 10, most preferably at least 15 genes,or their respective expression products.

In a particular embodiment, the expression level of one or moreprognostic RNA transcripts is determined, where RNA may, for example, beobtained from a fixed, wax-embedded breast cancer tissue specimen of thepatient. The isolation of RNA can, for example, be carried out followingany of the procedures described above or throughout the application, orby any other method known in the art.

In yet another aspect, the invention concerns an array comprisingpolynucleotides hybridizing to the following genes: FOXM1, PRAME, Bcl2,STK15, CEGP1, Ki-67, GSTM1, PR, BBC3, NME1, SURV, GATA3, TFRC, YB-1,DPYD, CA9, Contig51037, RPS6K1 and Her2, immobilized on a solid surface.

In a particular embodiment, the array comprises polynucleotideshybridizing to the following genes: FOXM1, PRAME, Bcl2, STK15, CEGP1,Ki-67, GSTM1, CA9, PR, BBC3, NME1, SURV, GATA3, TFRC, YB-1, DPYD, GSTM3,RPS6KB1, Src, Chk1, ID1, EstR1, p27, CCNB1, XIAP, Chk2, CDC25B, IGF1R,AK055699, P13KC2A, TGFB3, BAGI1, CYP3A4, EpCAM, VEGFC, pS2, hENT1,WISP1, HNF3A, NFKBp65, BRCA2, EGFR, TK1, VDR, Contig51037, pENT1, EPHX1,IF1A, CDH1, HIF1α, IGFBP3, CTSB, Her2 and DIABLO.

In a further aspect, the invention concerns a method of predicting thelikelihood of long-term survival of a patient diagnosed with invasivebreast cancer, without the recurrence of breast cancer, followingsurgical removal of the primary tumor, comprising the steps of:

(1) determining the expression levels of the RNA transcripts or theexpression products of genes of a gene set selected from the groupconsisting of

-   -   (a) Bcl2, cyclinG1, NFKBp65, NME1, EPHX1, TOP2B, DR5, TERC, Src,        DIABLO;    -   (b) Ki67, XIAP, hENT1, TS, CD9, p27, cyclinG1, pS2, NFKBp65,        CYP3A4;    -   (c) GSTM1, XIAP, Ki67, TS, cyclinG1, p27, CYP3A4, pS2, NFKBp65,        ErbB3;    -   (d) PR, NME1, XIAP, upa, cyclinG1, Contig51037, TERC, EPHX1,        ALDH1A3, CTSL;    -   (e) CA9, NME1, TERC, cyclinG1, EPHX1, DPYD, Src, TOP2B, NFKBp65,        VEGFC;    -   (f) TFRC, XIAP, Ki67, TS, cyclinG1, p27, CYP3A4, pS2, ErbB3,        NFKBp65;    -   (g) Bcl2, PRAME, cyclinG1, FOXM1, NFKBp65, TS, XIAP, Ki67,        CYP3A4, p27;    -   (h) FOXM1, cyclinG1, XIAP, Contig51037, PRAME, TS, Ki67, PDGFRa,        p27, NFKBp65;    -   (i) PRAME, FOXM1, cyclinG1, XIAP, Contig51037, TS, Ki6, PDGFRa,        p27, NFKBp65;    -   (j) Ki67, XIAP, PRAME, hENT1, contig51037, TS, CD9, p27, ErbB3,        cyclinG1;    -   (k) STK15, XIAP, PRAME, PLAUR, p27, CTSL, CD18, PREP, p53, RPS6        KB1;    -   (l) GSTM1, XIAP, PRAME, p27, Contig51037, ErbB3, GSTp, EREG,        ID1, PLAUR;    -   (m) PR, FRAME, NME1, XIAP, PLAUR, cyclinG1, Contig51037, TERC,        EPHX1, DR5;    -   (n) CA9, FOXM1, cyclinG1, XIAP, TS, Ki67, NFKBp65, CYP3A4,        GSTM3, p27;    -   (o) TFRC, XIAP, PRAME, p27, Contig51037, ErbB3, DPYD, TERC,        NME1, VEGFC; and    -   (p) CEGP1, PRAME, hENT1, XIAP, Contig51037, ErbB3, DPYD,        NFKBp65, ID1, TS        in a breast cancer tissue sample obtained from said patient,        normalized against the expression levels of all RNA transcripts        or their products in said breast cancer tissue sample, or of a        reference set of RNA transcripts or their products;

(2) subjecting the data obtained in step (a) to statistical analysis;and

(3) determining whether the likelihood of said long-term survival hasincreased or decreased.

In a still further aspect, the invention concerns a method of predictingthe likelihood of long-term survival of a patient diagnosed withestrogen receptor (ER)-positive invasive breast cancer, without therecurrence of breast cancer, following surgical removal of the primarytumor, comprising the steps of:

(1) determining the expression levels of the RNA transcripts or theexpression products of genes of a gene set selected from the groupconsisting of

-   -   (a) PRAME, p27, IGFBP2, HIF1A, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   (b) Contig51037, EPHX1, Ki67, TIMP2, cyclinG1, DPYD, CYP3A4, TP,        AIB1, CYP2C8;    -   (c) Bcl2, hENT1, FOXM1, Contig51037, cyclinG1, Contig46653,        PTEN, CYP3A4, TIMP2, AREG;    -   (d) HIF1A, PRAME, p27, IGFBP2, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   (e) IGF1R, PRAME, EPHX1, Contig51037, cyclinG1, Bcl2, NME1,        PTEN, TBP, TIMP2;    -   (f) FOXM1, Contig51037, VEGFC, TBP, HIF1A, DPYD, RAD51C, DCR3,        cyclinG1, BAG1;    -   (g) EPHX1, Contig51037, Ki67, TIMP2, cyclinG1, DPYD, CYP3A4, TP,        AIB1, CYP2C8;    -   (h) Ki67, VEGFC, VDR, GSTM3, p27, upa, ITGA7, rhoC, TERC, Pin1;    -   (i) CDC25B, Contig51037, hENT1, Bcl2, HLAG, TERC, NME1, upa,        ID1, CYP;    -   (j) VEGFC, Ki67, VDR, GSTM3, p27, upa, ITGA7, rhoC, TERC, Pin1;    -   (k) CTSB, PRAME, p27, IGFBP2, EPHX1, CTSL, BAD, DR5, DCR3, XIAP;    -   (l) DIABLO, Ki67, hENT1, TIMP2, ID1, p27, KRT19, IGFBP2, TS,        PDGFB;    -   (m) p27, PRAME, IGFBP2, HIF1A, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   (n) CDH1; PRAME, VEGFC; HIF1A; DPYD, TIMP2, CYP3A4, EstR1, RBP4,        p27;    -   (o) IGFBP3, PRAME, p27, Bcl2, XIAP, EstR1, Ki67, TS, Src, VEGF;    -   (p) GSTM3, PRAME, p27, IGFBP3, XIAP, FGF2, hENT1, PTEN, EstR1,        APC;    -   (q) hENT1, Bcl2, FOXM1, Contig51037, CyclinG1, Contig46653,        PTEN, CYP3A4, TIMP2, AREG;    -   (r) STK15, VEGFC, PRAME, p27, GCLC, hENT1, ID1, TIMP2, EstR1,        MCP1;    -   (s) NME1, PRAM, p27, IGFBP3, XIAP, PTEN, hENT1, Bcl2, CYP3A4,        HLAG;    -   (t) VDR, Bcl2, p27, hENT1, p53, PI3KC2A, EIF4E, TFRC, MCM3, ID1;    -   (u) EIF4E, Contig51037, EPHX1, cyclinG1, Bcl2, DR5, TBP, PTEN,        NME1, HER2;    -   (v) CCNB1, PRAME, VEGFC, HIF1A, hENT1, GCLC, TIMP2, ID1, p27,        upa;    -   (w) ID1, PRAME, DIABLO, hENT1, p27, PDGFRa, NME1, B1N1, BRCA1,        TP;    -   (x). FBXO5, PRAME, IGFBP3, p27, GSTM3, hENT1, XIAP, FGF2, TS,        PTEN;    -   (y) GUS, HIA1A, VEGFC, GSTM3, DPYD, hENT1, EBXO5, CA9, CYP,        KRT18; and    -   (z) Bclx, Bcl2, hENT1, Contig51037, HLAG, CD9, ID1, BRCA1, BIN1,        HBEGF;

(2) subjecting the data obtained in step (1) to statistical analysis;and

(3) determining whether the likelihood of said long-term survival hasincreased or decreased.

In a different aspect, the invention concerns an array comprisingpolynucleotides hybridizing to a gene set selected from the groupconsisting of

-   -   (a) Bcl2, cyclinG1, NFKBp65, NME1, EPHX1, TOP2B, DR5, TERC, Src,        DIABLO;    -   (b) Ki67, XIAP, hENT1, TS, CD9, p27, cyclinG1, pS2, NFKBp65,        CYP3A4;    -   (c) GSTM1, XIAP, Ki67, TS, cyclinG1, p27, CYP3A4, pS2,        NFKBp65ErbB3;    -   (d) PR, NME1, XIAP, upa, cyclinG1, Contig51037, TERC, EPHX1,        ALDH1A3, CTSL;    -   (e) CA9, NME1, TERC, cyclinG1, EPHX1, DPYD, Src, TOP2B, NFKBp65,        VEGFC;    -   (f) TFRC, XIAP, Ki67, TS, cyclinG1, p27, CYP3A4, pS2, ErbB3,        NFKBp65;    -   (g) Bcl2, PRAME, cyclinG1, FOXM1, NFKBp65, TS, XIAP, Ki67,        CYP3A4, p27;    -   (h) FOXM1, cyclinG1, XIAP, Contig51037, PRAME, TS, Ki67, PDGFRa,        p27, NFKBp65;    -   (i) PRAME, FOXM1, cyclinG1, XIAP, Contig51037, TS, Ki6, PDGFRa,        p27, NFKBp65;    -   (j) Ki67, XIAP, PRAME, hENT1, contig51037, TS, CD9, p27, ErbB3,        cyclinG1;    -   (k) STK15, XIAP, PRAME, PLAUR, p27, CTSL, CD18, PREP, p53, RPS6        KB1;    -   (l) GSTM1, XIAP, PRAME, p2′7, Contig51037, ErbB3, GSTp, EREG,        ID1, PLAUR;    -   (m) PR, PRAME, NME1, XIAP, PLAUR, cyclinG1, Contig51037, TERC,        EPHX1, DR5;    -   (n) CA9, FOXM1, cyclinG1, XIAP, TS, Ki67, NFKBp65, CYP3A4,        GSTM3, p27;    -   (o) TFRC, XIAP, PRAME, p27, Contig51037, ErbB3, DPYD, TERC,        NME1, VEGFC; and    -   (p) CEGP1, PRAME, hENT1, XIAP, Contig51037, ErbB3, DPYD,        NFKBp65, ID1, TS,        immobilized on a solid surface.

In an additional aspect, the invention concerns an array comprisingpolynucleotides hybridizing to a gene set selected from the groupconsisting of:

-   -   (a) PRAME, p27, IGFBP2, HIF1A, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   (b) Contig51037, EPHX1, Ki67, TIMP2, cyclinG1, DPYD, CYP3A4, TP,        AIB1, CYP2C8;    -   (c) Bcl2, hENT1, FOXM1, Contig51037, cyclinG1, Contig46653,        PTEN, CYP3A4, TIMP2, AREG;    -   (d) HIF1A, PRAME, p27, IGFBP2, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   (e) IGF1R, PRAME, EPHX1, Contig51037, cyclinG1, Bcl2, NME1,        PTEN, TBP, TIMP2;    -   (f) FOXM1, Contig51037, VEGFC, TBP, HIF1A, DPYD, RAD51C, DCR3,        cyclinG1, BAG1;    -   (g) EPHX1, Contig51037, Ki67, TIMP2, cyclinG1, DPYD, CYP3A4, TP,        AIB1, CYP2C8;    -   (h) Ki67, VEGFC, VDR, GSTM3, p27, upa, ITGA7, rhoC, TERC, Pin1;    -   (i) CDC25B, Contig51037, hENT1, Bcl2, HLAG, TERC, NME1, upa,        ID1, CYP;    -   (j) VEGFC, Ki67, VDR, GSTM3, p27, upa, ITGA7, rhoC, TERC, Pin1;    -   (k) CTSB, PRAME, p27, IGFBP2, EPHX1, CTSL, BAD, DR5, DCR3, XIAP;    -   (l) DIABLO, Ki67, hENT1, TIMP2, ID1, p27, KRT19, IGFBP2, TS,        PDGFB;    -   (m) p27, PRAME, IGFBP2, HIF1A, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   (n) CDH1; PRAME, VEGFC; HIF1A; DPYD, TIMP2, CYP3A4, EstR1, RBP4,        p27;    -   (o) IGFBP3, PRAME, p27, Bcl2, XIAP, EstR1, Ki67, TS, Src, VEGF;    -   (p) GSTM3, PRAME, p27, IGFBP3, XIAP, FGF2, hENT1, PTEN, EstR1,        APC;    -   (q) hENT1, Bcl2, FOXM1, Contig51037, CyclinG1, Contig46653,        PTEN, CYP3A4, TIMP2, AREG;    -   (r) STK15, VEGFC, PRAME, p27, GCLC, hENT1, ID1, TIMP2, EstR1,        MCP1;    -   (s) NME1, PRAM, p27, IGFBP3, XIAP, PTEN, hENT1, Bcl2, CYP3A4,        HLAG;    -   (t) VDR, Bcl2, p27, hENT1, p53, PI3KC2A, EIF4E, TFRC, MCM3, ID1;    -   (u) EIF4E, Contig51037, EPHX1, cyclinG1, Bcl2, DR5, TBP, PTEN,        NME1, HER2;    -   (v) CCNB1, PRAME, VEGFC, HIF1A, hENT1, GCLC, TIMP2, ID1, p27,        upa;    -   (w) ID1, PRAME, DIABLO, hENT1, p27, PDGFRa, NME1, BIN1, BRCA1,        TP;    -   (x) FBXO5, PRAME, IGFBP3, p27, GSTM3, hENT1, XIAP, FGF2, TS,        PTEN;    -   (y) GUS, HIA1A, VEGFC, GSTM3, DPYD, hENT1, FBXO5, CA9, CYP,        KRT18; and    -   (z) Bclx, Bcl2, hENT1, Contig51037, HLAG, CD9, ID1, BRCA1, BIN1,        HBEGF,        immobilized on a solid surface.

In all aspects, the polynucleotides can be cDNAs (“cDNA arrays”) thatare typically about 500 to 5000 bases long, although shorter or longercDNAs can also be used and are within the scope of this invention.Alternatively, the polynucleotides can be oligonucleotides (DNAmicroarrays), which are typically about 20 to 80 bases long, althoughshorter and longer oligonucleotides are also suitable and are within thescope of the invention. The solid surface can, for example, be glass ornylon, or any other solid surface typically used in preparing arrays,such as microarrays, and is typically glass.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a chart illustrating the overall workflow of the process ofthe invention for measurement of gene expression. In the Figure, FPETstands for “fixed paraffin-embedded tissue,” and “RT-PCR” stands for“reverse transcriptase PCR.” RNA concentration is determined by usingthe commercial RiboGreen™ RNA Quantitation Reagent and Protocol.

FIG. 2 is a flow chart showing the steps of an RNA extraction methodaccording to the invention alongside a flow chart of a representativecommercial method.

FIG. 3 is a scheme illustrating the steps of an improved method forpreparing fragmented mRNA for expression profiling analysis.

FIG. 4 illustrates methods for amplification of RNA prior to RT-PCR.

FIG. 5 illustrates an alternative scheme for repair and amplification offragmented mRNA.

FIG. 6 shows the measurement of estrogen receptor mRNA levels in 40 FPEbreast cancer specimens via RT-PCR. Three 10 micron sections were usedfor each measurement. Each data point represents the average oftriplicate measurements.

FIG. 7 shows the results of the measurement of progesterone receptormRNA levels in 40 FPE breast cancer specimens via RT-PCR performed asdescribed in the legend of FIG. 6 above.

FIG. 8 shows results from an IVT/RT-PCR experiment.

FIG. 9 is a representation of the expression of 92 genes across 70 FPEbreast cancer specimens. The y-axis shows expression as cycle thresholdtimes. These genes are a subset of the genes listed in Table 1.

Table 1 shows a breast cancer gene list.

Table 2 sets forth amplicon and primer sequences used for amplificationof fragmented mRNA.

Table 3 shows the Accession Nos. and SEQ ID NOS of the breast cancergenes examined.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT A. Definitions

Unless defined otherwise, technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Singleton et al., Dictionary ofMicrobiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York,N.Y. 1994), and March, Advanced Organic Chemistry Reactions, Mechanismsand Structure 4th ed., John Wiley & Sons (New York, N.Y. 1992), provideone skilled in the art with a general guide to many of the terms used inthe present application.

One skilled in the art will recognize many methods and materials similaror equivalent to those described herein, which could be used in thepractice of the present invention. Indeed, the present invention is inno way limited to the methods and materials described. For purposes ofthe present invention, the following terms are defined below.

The term “microarray” refers to an ordered arrangement of hybridizablearray elements, preferably polynucleotide probes, on a substrate.

The term “polynucleotide,” when used in singular or plural, generallyrefers to any polyribonucleotide or polydeoxyribonucleotide, which maybe unmodified RNA or DNA or modified RNA or DNA. Thus, for instance,polynucleotides as defined herein include, without limitation, single-and double-stranded DNA, DNA including single- and double-strandedregions, single- and double-stranded RNA, and RNA including single- anddouble-stranded regions, hybrid molecules comprising DNA and RNA thatmay be single-stranded or, more typically, double-stranded or includesingle- and double-stranded regions. In addition, the term“polynucleotide” as used herein refers to triple-stranded regionscomprising RNA or DNA or both RNA and DNA. The strands in such regionsmay be from the same molecule or from different molecules. The regionsmay include all of one or more of the molecules, but more typicallyinvolve only a region of some of the molecules. One of the molecules ofa triple-helical region often is an oligonucleotide. The term“polynucleotide” specifically includes DNAs and RNAs that contain one ormore modified bases. Thus, DNAs or RNAs with backbones modified forstability or for other reasons are “polynucleotides” as that term isintended herein. Moreover, DNAs or RNAs comprising unusual bases, suchas inosine, or modified bases, such as tritiated bases, are includedwithin the term “polynucleotides” as defined herein. In general, theterm “polynucleotide” embraces all chemically, enzymatically and/ormetabolically modified forms of unmodified polynucleotides, as well asthe chemical forms of DNA and RNA characteristic of viruses and cells,including simple and complex cells.

The term “oligonucleotide” refers to a relatively short polynucleotide,including, without limitation, single-stranded deoxyribonucleotides,single- or double-stranded ribonucleotides, RNA:DNA hybrids anddouble-stranded DNAs. Oligonucleotides, such as single-stranded DNAprobe oligonucleotides, are often synthesized by chemical methods, forexample using automated oligonucleotide synthesizers that arecommercially available. However, oligonucleotides can be made by avariety of other methods, including in vitro recombinant DNA-mediatedtechniques and by expression of DNAs in cells and organisms.

The terms “differentially expressed gene,” “differential geneexpression” and their synonyms, which are used interchangeably, refer toa gene whose expression is activated to a higher or lower level in asubject suffering from a disease, specifically cancer, such as breastcancer, relative to its expression in a normal or control subject. Theterms also include genes whose expression is activated to a higher orlower level at different stages of the same disease. It is alsounderstood that a differentially expressed gene may be either activatedor inhibited at the nucleic acid level or protein level, or may besubject to alternative splicing to result in a different polypeptideproduct. Such differences may be evidenced by a change in mRNA levels,surface expression, secretion or other partitioning of a polypeptide,for example. Differential gene expression may include a comparison ofexpression between two or more genes, or a comparison of the ratios ofthe expression between two or more genes, or even a comparison of twodifferently processed products of the same gene, which differ betweennormal subjects and subjects suffering from a disease, specificallycancer, or between various stages of the same disease. Differentialexpression includes both quantitative, as well as qualitative,differences in the temporal or cellular expression pattern in a gene orits expression products among, for example, normal and diseased cells,or among cells which have undergone different disease events or diseasestages. For the purpose of this invention, “differential geneexpression” is considered to be present when there is at least an abouttwo-fold, preferably at least about four-fold, more preferably at leastabout six-fold, most preferably at least about ten-fold differencebetween the expression of a given gene in normal and diseased subjects,or in various stages of disease development in a diseased subject.

The phrase “gene amplification” refers to a process by which multiplecopies of a gene or gene fragment are formed in a particular cell orcell line. The duplicated region (a stretch of amplified DNA) is oftenreferred to as “amplicon.” Usually, the amount of the messenger RNA(mRNA) produced, i.e., the level of gene expression, also increases inthe proportion of the number of copies made of the particular geneexpressed.

The term “prognosis” is used herein to refer to the prediction of thelikelihood of cancer-attributable death or progression, includingrecurrence, metastatic spread, and drug resistance, of a neoplasticdisease, such as breast cancer. The term “prediction” is used herein torefer to the likelihood that a patient will respond either favorably orunfavorably to a drug or set of drugs, and also the extent of thoseresponses. The predictive methods of the present invention can be usedclinically to make treatment decisions by choosing the most appropriatetreatment modalities for any particular patient. The predictive methodsof the present invention are valuable tools in predicting if a patientis likely to respond favorably to a treatment regimen, such as surgicalintervention, chemotherapy with a given drug or drug combination, and/orradiation therapy.

The term “increased resistance” to a particular drug or treatmentoption, when used in accordance with the present invention, meansdecreased response to a standard dose of the drug or to a standardtreatment protocol.

The term “decreased sensitivity” to a particular drug or treatmentoption, when used in accordance with the present invention, meansdecreased response to a standard dose of the drug or to a standardtreatment protocol, where decreased response can be compensated for (atleast partially) by increasing the dose of drug, or the intensity oftreatment.

“Patient response” can be assessed using any endpoint indicating abenefit to the patient, including, without limitation, (1) inhibition,to some extent, of tumor growth, including slowing down and completegrowth arrest; (2) reduction in the number of tumor cells; (3) reductionin tumor size; (4) inhibition (i.e., reduction, slowing down or completestopping) of tumor cell infiltration into adjacent peripheral organsand/or tissues; (5) inhibition (i.e. reduction, slowing down or completestopping) of metastasis; (6) enhancement of anti-tumor immune response,which may, but does not have to, result in the regression or rejectionof the tumor; (7) relief, to some extent, of one or more symptomsassociated with the tumor; (8) increase in the length of survivalfollowing treatment; and/or (9) decreased mortality at a given point oftime following treatment.

The term “treatment” refers to both therapeutic treatment andprophylactic or preventative measures, wherein the object is to preventor slow down (lessen) the targeted pathologic condition or disorder.Those in need of treatment include those already with the disorder aswell as those prone to have the disorder or those in whom the disorderis to be prevented. In tumor (e.g., cancer) treatment, a therapeuticagent may directly decrease the pathology of tumor cells, or render thetumor cells more susceptible to treatment by other therapeutic agents,e.g., radiation and/or chemotherapy.

The term “tumor,” as used herein, refers to all neoplastic cell growthand proliferation, whether malignant or benign, and all pre-cancerousand cancerous cells and tissues.

The terms “cancer” and “cancerous” refer to or describe thephysiological condition in mammals that is typically characterized byunregulated cell growth. Examples of cancer include but are not limitedto, breast cancer, colon cancer, lung cancer, prostate cancer,hepatocellular cancer, gastric cancer, pancreatic cancer, cervicalcancer, ovarian cancer, liver cancer, bladder cancer, cancer of theurinary tract, thyroid cancer, renal cancer, carcinoma, melanoma, andbrain cancer.

The “pathology” of cancer includes all phenomena that compromise thewell-being of the patient. This includes, without limitation, abnormalor uncontrollable cell growth, metastasis, interference with the normalfunctioning of neighboring cells, release of cytokines or othersecretory products at abnormal levels, suppression or aggravation ofinflammatory or immunological response, neoplasia, premalignancy,malignancy, invasion of surrounding or distant tissues or organs, suchas lymph nodes, etc.

“Stringency” of hybridization reactions is readily determinable by oneof ordinary skill in the art, and generally is an empirical calculationdependent upon probe length, washing temperature, and saltconcentration. In general, longer probes require higher temperatures forproper annealing, while shorter probes need lower temperatures.Hybridization generally depends on the ability of denatured DNA toreanneal when complementary strands are present in an environment belowtheir melting temperature. The higher the degree of desired homologybetween the probe and hybridizable sequence, the higher the relativetemperature which can be used. As a result, it follows that higherrelative temperatures would tend to make the reaction conditions morestringent, while lower temperatures less so. For additional details andexplanation of stringency of hybridization reactions, see Ausubel etal., Current Protocols in Molecular Biology, Wiley IntersciencePublishers, (1995).

“Stringent conditions” or “high stringency conditions”, as definedherein, typically: (1) employ low ionic strength and high temperaturefor washing, for example 0.015 M sodium chloride/0.0015 M sodiumcitrate/0.1% sodium dodecyl sulfate at 50° C.; (2) employ duringhybridization a denaturing agent, such as formamide, for example, 50%(v/v) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1%polyvinylpyrrolidone/50 mM sodium phosphate buffer at pH 6.5 with 750 mMsodium chloride, 75 mM sodium citrate at 42° C.; or (3) employ 50%formamide, 5×SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodiumphosphate (pH 6.8), 0.1% sodium pyrophosphate, 5×Denhardt's solution,sonicated salmon sperm DNA (50 μg/ml), 0.1% SDS, and 10% dextran sulfateat 42° C., with washes at 42° C. in 0.2×SSC (sodium chloride/sodiumcitrate) and 50% formamide at 55° C., followed by a high-stringency washconsisting of 0.1×SSC containing EDTA at 55° C.

“Moderately stringent conditions” may be identified as described bySambrook et al., Molecular Cloning: A Laboratory Manual, New York: ColdSpring Harbor Press, 1989, and include the use of washing solution andhybridization conditions (e.g., temperature, ionic strength and % SDS)less stringent that those described above. An example of moderatelystringent conditions is overnight incubation at 37° C. in a solutioncomprising: 20% formamide, 5×SSC (150 mM NaCl, 15 mM trisodium citrate),50 mM sodium phosphate (pH 7.6), 5×Denhardt's solution, 10% dextransulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed bywashing the filters in 1×SSC at about 37-50° C. The skilled artisan willrecognize how to adjust the temperature, ionic strength, etc. asnecessary to accommodate factors such as probe length and the like. Inthe context of the present invention, reference to “at least one,” “atleast two,” “at least five,” etc. of the genes listed in any particulargene set means any one or any and all combinations of the genes listed.

The terms “splicing” and “RNA splicing” are used interchangeably andrefer to RNA processing that removes introns and joins exons to producemature mRNA with continuous coding sequence that moves into thecytoplasm of an eukaryotic cell.

In theory, the term “exon” refers to any segment of an interrupted genethat is represented in the mature RNA product (B. Lewin. Genes IV CellPress, Cambridge Mass. 1990). In theory the term “intron” refers to anysegment of DNA that is transcribed but removed from within thetranscript by splicing together the exons on either side of it.Operationally, exon sequences occur in the mRNA sequence of a gene asdefined by Ref. Seq ID numbers. Operationally, intron sequences are theintervening sequences within the genomic DNA of a gene, bracketed byexon sequences and having GT and AG splice consensus sequences at their5′ and 3′ boundaries.

B. Detailed Description

The practice of the present invention will employ, unless otherwiseindicated, conventional techniques of molecular biology (includingrecombinant techniques), microbiology, cell biology, and biochemistry,which are within the skill of the art. Such techniques are explainedfully in the literature, such as, “Molecular Cloning: A LaboratoryManual”, 2^(nd) edition (Sambrook et al., 1989); “OligonucleotideSynthesis” (M. J. Gait, ed., 1984); “Animal Cell Culture” (R. I.Freshney, ed., 1987); “Methods in Enzymology” (Academic Press, Inc.);“Handbook of Experimental Immunology”, 4^(th) edition (D. M. Weir & C.C. Blackwell, eds., Blackwell Science Inc., 1987); “Gene TransferVectors for Mammalian Cells” (J. M. Miller & M. P. Calos, eds., 1987);“Current Protocols in Molecular Biology” (F. M. Ausubel et al., eds.,1987); and “PCR: The Polymerase Chain Reaction”, (Mullis et al., eds.,1994).

1. Gene Expression Profiling

In general, methods of gene expression profiling can be divided into twolarge groups: methods based on hybridization analysis ofpolynucleotides, and methods based on sequencing of polynucleotides. Themost commonly used methods known in the art for the quantification ofmRNA expression in a sample include northern blotting and in situhybridization (Parker & Barnes, Methods in Molecular Biology 106:247-283(1999)); RNAse protection assays (Hod, Biotechniques 13:852-854 (1992));and reverse transcription polymerase chain reaction (RT-PCR) (Weis etal., Trends in Genetics 8:263-264 (1992)). Alternatively, antibodies maybe employed that can recognize specific duplexes, including DNAduplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-proteinduplexes. Representative methods for sequencing-based gene expressionanalysis include Serial Analysis of Gene Expression (SAGE), and geneexpression analysis by massively parallel signature sequencing (MPSS).

2. Reverse Transcriptase PCR (RT-PCR)

Of the techniques listed above, the most sensitive and most flexiblequantitative method is RT-PCR, which can be used to compare mRNA levelsin different sample populations, in normal and tumor tissues, with orwithout drug treatment, to characterize patterns of gene expression, todiscriminate between closely related mRNAs, and to analyze RNAstructure.

The first step is the isolation of mRNA from a target sample. Thestarting material is typically total RNA isolated from human tumors ortumor cell lines, and corresponding normal tissues or cell lines,respectively. Thus RNA can be isolated from a variety of primary tumors,including breast, lung, colon, prostate, brain, liver, kidney, pancreas,spleen, thymus, testis, ovary, uterus, etc., tumor, or tumor cell lines,with pooled DNA from healthy donors. If the source of mRNA is a primarytumor, mRNA can be extracted, for example, from frozen or archivedparaffin-embedded and fixed (e.g. formalin-fixed) tissue samples.

General methods for mRNA extraction are well known in the art and aredisclosed in standard textbooks of molecular biology, including Ausubelet al., Current Protocols of Molecular Biology, John Wiley and Sons(1997). Methods for RNA extraction from paraffin embedded tissues aredisclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987),and De Andrés et al., BioTechniques 18:42044 (1995). In particular, RNAisolation can be performed using purification kit, buffer set andprotease from commercial manufacturers, such as Qiagen, according to themanufacturer's instructions. For example, total RNA from cells inculture can be isolated using Qiagen RNeasy mini-columns. Othercommercially available RNA isolation kits include MasterPure™ CompleteDNA and RNA Purification Kit (EPICENTRE®, Madison, Wis.), and ParaffinBlock RNA Isolation Kit (Ambion, Inc.). Total RNA from tissue samplescan be isolated using RNA Stat-60 (Tel-Test). RNA prepared from tumorcan be isolated, for example, by cesium chloride density gradientcentrifugation.

As RNA cannot serve as a template for PCR, the first step in geneexpression profiling by RT-PCR is the reverse transcription of the RNAtemplate into cDNA, followed by its exponential amplification in a PCRreaction. The two most commonly used reverse transcriptases are avilomyeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murineleukemia virus reverse transcriptase (MMLV-RT). The reversetranscription step is typically primed using specific primers, randomhexamers, or oligo-dT primers, depending on the circumstances and thegoal of expression profiling. For example, extracted RNA can bereverse-transcribed using a GeneAmp RNA PCR kit (Perkin Elmer, Calif.,USA), following the manufacturer's instructions. The derived cDNA canthen be used as a template in the subsequent PCR reaction.

Although the PCR step can use a variety of thermostable DNA-dependentDNA polymerases, it typically employs the Taq DNA polymerase, which hasa 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonucleaseactivity. Thus, TaqMan® PCR typically utilizes the 5′-nuclease activityof Taq or Tth polymerase to hydrolyze a hybridization probe bound to itstarget amplicon, but any enzyme with equivalent 5′ nuclease activity canbe used. Two oligonucleotide primers are used to generate an amplicontypical of a PCR reaction. A third oligonucleotide, or probe, isdesigned to detect nucleotide sequence located between the two PCRprimers. The probe is non-extendible by Taq DNA polymerase enzyme, andis labeled with a reporter fluorescent dye and a quencher fluorescentdye. Any laser-induced emission from the reporter dye is quenched by thequenching dye when the two dyes are located close together as they areon the probe. During the amplification reaction, the Taq DNA polymeraseenzyme cleaves the probe in a template-dependent manner. The resultantprobe fragments disassociate in solution, and signal from the releasedreporter dye is free from the quenching effect of the secondfluorophore. One molecule of reporter dye is liberated for each newmolecule synthesized, and detection of the unquenched reporter dyeprovides the basis for quantitative interpretation of the data.

TaqMan® RT-PCR can be performed using commercially available equipment,such as, for example, ABI PRISM 7700™ Sequence Detection System™(Perkin-Elmer-Applied Biosystems, Foster City, Calif., USA), orLightcycler (Roche Molecular Biochemicals, Mannheim, Germany). In apreferred embodiment, the 5′ nuclease procedure is run on a real-timequantitative PCR device such as the ABI PRISM 7700™ Sequence DetectionSystem™. The system consists of a thermocycler, laser, charge-coupleddevice (CCD), camera and computer. The system amplifies samples in a96-well format on a thermocycler. During amplification, laser-inducedfluorescent signal is collected in real-time through fiber optics cablesfor all 96 wells, and detected at the CCD. The system includes softwarefor running the instrument and for analyzing the data.

5′-Nuclease assay data are initially expressed as Ct, or the thresholdcycle. As discussed above, fluorescence values are recorded during everycycle and represent the amount of product amplified to that point in theamplification reaction. The point when the fluorescent signal is firstrecorded as statistically significant is the threshold cycle (C_(t)).

To minimize errors and the effect of sample-to-sample variation, RT-PCRis usually performed using an internal standard. The ideal internalstandard is expressed at a constant level among different tissues, andis unaffected by the experimental treatment. RNAs most frequently usedto normalize patterns of gene expression are mRNAs for the housekeepinggenes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and β-actin.

A more recent variation of the RT-PCR technique is the real timequantitative PCR, which measures PCR product accumulation through adual-labeled fluorogenic probe (i.e., TaqMan® probe). Real time PCR iscompatible both with quantitative competitive PCR, where internalcompetitor for each target sequence is used for normalization, and withquantitative comparative PCR using a normalization gene contained withinthe sample, or a housekeeping gene for RT-PCR. For further details see,e.g. Held et al. Genome Research 6:986-994 (1996).

3. Microarrays

Differential gene expression can also be identified, or confirmed usingthe microarray technique. Thus, the expression profile of breastcancer-associated genes can be measured in either fresh orparaffin-embedded tumor tissue, using microarray technology. In thismethod, polynucleotide sequences of interest are plated, or arrayed, ona microchip substrate. The arrayed sequences are then hybridized withspecific DNA probes from cells or tissues of interest. Just as in theRT-PCR method, the source of mRNA typically is total RNA isolated fromhuman tumors or tumor cell lines, and corresponding normal tissues orcell lines. Thus RNA can be isolated from a variety of primary tumors ortumor cell lines. If the source of mRNA is a primary tumor, mRNA can beextracted, for example, from frozen or archived paraffin-embedded andfixed (e.g. formalin-fixed) tissue samples, which are routinely preparedand preserved in everyday clinical practice.

In a specific embodiment of the microarray technique, PCR amplifiedinserts of cDNA clones are applied to a substrate in a dense array.Preferably at least 10,000 nucleotide sequences are applied to thesubstrate. The microarrayed genes, immobilized on the microchip at10,000 elements each, are suitable for hybridization under stringentconditions. Fluorescently labeled cDNA probes may be generated throughincorporation of fluorescent nucleotides by reverse transcription of RNAextracted from tissues of interest. Labeled cDNA probes applied to thechip hybridize with specificity to each spot of DNA on the array. Afterstringent washing to remove non-specifically bound probes, the chip isscanned by confocal laser microscopy or by another detection method,such as a CCD camera. Quantitation of hybridization of each arrayedelement allows for assessment of corresponding mRNA abundance. With dualcolor fluorescence, separately labeled cDNA probes generated from twosources of RNA are hybridized pairwise to the array. The relativeabundance of the transcripts from the two sources corresponding to eachspecified gene is thus determined simultaneously. The miniaturized scaleof the hybridization affords a convenient and rapid evaluation of theexpression pattern for large numbers of genes. Such methods have beenshown to have the sensitivity required to detect rare transcripts, whichare expressed at a few copies per cell, and to reproducibly detect atleast approximately two-fold differences in the expression levels(Schena et al., Proc. Natl. Acad. Sci. USA 93(2):106-149 (1996)).Microarray analysis can be performed by commercially availableequipment, following manufacturer's protocols, such as by using theAffymetrix GenChip technology, or Incyte's microarray technology.

The development of microarray methods for large-scale analysis of geneexpression makes it possible to search systematically for molecularmarkers of cancer classification and outcome prediction in a variety oftumor types.

4. Serial Analysis of Gene Expression (SAGE)

Serial analysis of gene expression (SAGE) is a method that allows thesimultaneous and quantitative analysis of a large number of genetranscripts, without the need of providing an individual hybridizationprobe for each transcript. First, a short sequence tag (about 10-14 bp)is generated that contains sufficient information to uniquely identify atranscript, provided that the tag is obtained from a unique positionwithin each transcript. Then, many transcripts are linked together toform long serial molecules, that can be sequenced, revealing theidentity of the multiple tags simultaneously. The expression pattern ofany population of transcripts can be quantitatively evaluated bydetermining the abundance of individual tags, and identifying the genecorresponding to each tag. For more details see, e.g. Velculescu et al.,Science 270:484-487 (1995); and Velculescu et al., Cell 88:243-51(1997).

5. Gene Expression Analysis by Massively Parallel Signature Sequencing(MPSS)

This method, described by Brenner et al., Nature Biotechnology18:630-634 (2000), is a sequencing approach that combines non-gel-basedsignature sequencing with in vitro cloning of millions of templates onseparate 5 μm diameter microbeads. First, a microbead library of DNAtemplates is constructed by in vitro cloning. This is followed by theassembly of a planar array of the template-containing microbeads in aflow cell at a high density (typically greater than 3×10⁶microbeads/cm²). The free ends of the cloned templates on each microbeadare analyzed simultaneously, using a fluorescence-based signaturesequencing method that does not require DNA fragment separation. Thismethod has been shown to simultaneously and accurately provide, in asingle operation, hundreds of thousands of gene signature sequences froma yeast cDNA library.

6. General Description of the mRNA Isolation, Purification andAmplification Methods of the Invention

The steps of a representative protocol of the invention, including mRNAisolation, purification, primer extension and amplification areillustrated in FIG. 1. As shown in FIG. 1, this representative processstarts with cutting about 10 μm thick sections of paraffin-embeddedtumor tissue samples. The RNA is then extracted, and protein and DNA areremoved, following the method of the invention described below. Afteranalysis of the RNA concentration, RNA repair and/or amplification stepsmay be included, if necessary, and RNA is reverse transcribed using genespecific promoters followed by RT-PCR. Finally, the data are analyzed toidentify the best treatment option(s) available to the patient on thebasis of the characteristic gene expression pattern identified in thetumor sample examined. The individual steps of this protocol will bediscussed in greater detail below.

7. Improved Method for Isolation of Nucleic Acid from Archived TissueSpecimens

As discussed above, in the first step of the method of the invention,total RNA is extracted from the source material of interest, includingfixed, paraffin-embedded tissue specimens, and purified sufficiently toact as a substrate in an enzyme assay. Despite the availability ofcommercial products, and the extensive knowledge available concerningthe isolation of nucleic acid, such as RNA, from tissues, isolation ofnucleic acid (RNA) from fixed, paraffin-embedded tissue specimens (FPET)is not without difficulty.

In one aspect, the present invention concerns an improved method for theisolation of nucleic acid from archived, e.g. FPET tissue specimens.Measured levels of mRNA species are useful for defining thephysiological or pathological status of cells and tissues. RT-PCR (whichis discussed above) is one of the most sensitive, reproducible andquantitative methods for this “gene expression profiling”.Paraffin-embedded, formalin-fixed tissue is the most widely availablematerial for such studies. Several laboratories have demonstrated thatit is possible to successfully use fixed-paraffin-embedded tissue (FPET)as a source of RNA for RT-PCR (Stanta et al., Biotechniques 11:304-308(1991); Stanta et al., Methods Mol. Biol. 86:23-26 (1998); Jackson etal., Lancet 1:1391 (1989); Jackson et al., J. Clin. Pathol. 43:499-504(1999); Finke et al., Biotechniques 14:448-453 (1993); Goldsworthy etal., Mol. Carcinog. 25:86-91 (1999); Stanta and Bonin, Biotechniques24:271-276 (1998); Godfrey et al., J. Mol. Diagnostics 2:84 (2000);Specht et al., J. Mol. Med. 78:B27 (2000); Specht et al., Am. J. Pathol.158:419-429 (2001)). This allows gene expression profiling to be carriedout on the most commonly available source of human biopsy specimens, andtherefore potentially to create new valuable diagnostic and therapeuticinformation.

The most widely used protocols utilize hazardous organic solvents, suchas xylene, or octane (Finke et al., supra) to dewax the tissue in theparaffin blocks before nucleic acid (RNA and/or DNA) extraction.Obligatory organic solvent removal (e.g. with ethanol) and rehydrationsteps follow, which necessitate multiple manipulations, and addition ofsubstantial total time to the protocol, which can take up to severaldays. Commercial kits and protocols for RNA extraction from FPET[MasterPure™ Complete DNA and RNA Purification Kit (EPICENTRE®, Madison,Wis.); Paraffin Block RNA Isolation Kit (Ambion, Inc.) and RNeasy™ Minikit (Qiagen, Chatsworth, Calif.)] use xylene for deparaffinization, inprocedures which typically require multiple centrifugations and ethanolbuffer changes, and incubations following incubation with xylene.

The present invention provides an improved nucleic acid extractionprotocol that produces nucleic acid, in particular RNA, sufficientlyintact for gene expression measurements. The key step in the nucleicacid extraction protocol herein is the performance of dewaxing withoutthe use of any organic solvent, thereby eliminating the need formultiple manipulations associated with the removal of the organicsolvent, and substantially reducing the total time to the protocol.According to the invention, wax, e.g. paraffin is removed fromwax-embedded tissue samples by incubation at 65-75° C. in a lysis bufferthat solubilizes the tissue and hydrolyzes the protein, following bycooling to solidify the wax.

FIG. 2 shows a flow chart of an RNA extraction protocol of the presentinvention in comparison with a representative commercial method, usingxylene to remove wax. The times required for individual steps in theprocesses and for the overall processes are shown in the chart. Asshown, the commercial process requires approximately 50% more time thanthe process of the invention.

The lysis buffer can be any buffer known for cell lysis. It is, however,preferred that oligo-dT-based methods of selectively purifyingpolyadenylated mRNA not be used to isolate RNA for the presentinvention, since the bulk of the mRNA molecules are expected to befragmented and therefore will not have an intact polyadenylated tail,and will not be recovered or available for subsequent analytical assays.Otherwise, any number of standard nucleic acid purification schemes canbe used. These include chaotrope and organic solvent extractions,extraction using glass beads or filters, salting out and precipitationbased methods, or any of the purification methods known in the art torecover total RNA or total nucleic acids from a biological source.

Lysis buffers are commercially available, such as, for example, fromQiagen, Epicentre, or Ambion. A preferred group of lysis bufferstypically contains urea, and Proteinase K or other protease. ProteinaseK is very useful in the isolation of high quality, undamaged DNA or RNA,since most mammalian DNases and RNases are rapidly inactivated by thisenzyme, especially in the presence of 0.5-1% sodium dodecyl sulfate(SDS). This is particularly important in the case of RNA, which is moresusceptible to degradation than DNA. While DNases require metal ions foractivity, and can therefore be easily inactivated by chelating agents,such as EDTA, there is no similar co-factor requirement for RNases.

Cooling and resultant solidification of the wax permits easy separationof the wax from the total nucleic acid, which can be convenientlyprecipitated, e.g. by isopropanol. Further processing depends on theintended purpose. If the proposed method of RNA analysis is subject tobias by contaminating DNA in an extract, the RNA extract can be furthertreated, e.g. by DNase, post purification to specifically remove DNAwhile preserving RNA. For example, if the goal is to isolate highquality RNA for subsequent RT-PCR amplification, nucleic acidprecipitation is followed by the removal of DNA, usually by DNasetreatment. However, DNA can be removed at various stages of nucleic acidisolation, by DNase or other techniques well known in the art.

While the advantages of the nucleic acid extraction protocol of theinvention are most apparent for the isolation of RNA from archived,paraffin embedded tissue samples, the wax removal step of the presentinvention, which does not involve the use of an organic solvent, canalso be included in any conventional protocol for the extraction oftotal nucleic acid (RNA and DNA) or DNA only. All of these aspects arespecifically within the scope of the invention.

By using heat followed by cooling to remove paraffin, the process of thepresent invention saves valuable processing time, and eliminates aseries of manipulations, thereby potentially increasing the yield ofnucleic acid. Indeed, experimental evidence presented in the examplesbelow, demonstrates that the method of the present invention does notcompromise RNA yield.

8. 5′-Multiplexed Gene Specific Priming of Reverse Transcription

RT-PCR requires reverse transcription of the test RNA population as afirst step. The most commonly used primer for reverse transcription isoligo-dT, which works well when RNA is intact. However, this primer willnot be effective when RNA is highly fragmented as is the case in FPEtissues.

The present invention includes the use of gene specific primers, whichare roughly 20 bases in length with a Tm optimum between about 58° C.and 60° C. These primers will also serve as the reverse primers thatdrive PCR DNA amplification.

Another aspect of the invention is the inclusion of multiplegene-specific primers in the same reaction mixture. The number of suchdifferent primers can vary greatly and can be as low as two and as highas 40,000 or more. Table 2 displays examples of reverse primers that canbe successfully used in carrying out the methods of the invention. FIG.9 shows expression data obtained using this multiplexed gene-specificpriming strategy. Specifically, FIG. 9 is a representation of theexpression of 92 genes (a subset of genes listed in Table 1) across 70FPE breast cancer specimens. The y-axis shows expression as cyclethreshold times.

An alternative approach is based on the use of random hexamers asprimers for cDNA synthesis. However, we have experimentally demonstratedthat the method of using a multiplicity of gene-specific primers issuperior over the known approach using random hexamers.

9. Preparation of Fragmented mRNA for Expression Profiling Assays

It is of interest to analyze the abundance of specific mRNA species inbiological samples, since this expression profile provides an index ofthe physiological state of that sample. mRNA is notoriously difficult toextract and maintain in its native state, consequently, mRNA recoveredfrom biological sources is often fragmented or somewhat degraded. Thisis especially true of human tissue specimen which have been chemicallyfixed and stored for extended periods of time.

In one aspect, the present invention provides a means of preparing themRNA extracted from various sources, including archived tissuespecimens, for expression profiling in a way that its relative abundanceis preserved and the mRNA's of interest can be successfully measured.This method is useful as a means of preparing mRNA for analysis by anyof the known expression profiling methods, including RT-PCR coupled with5′ exonuclease of reporter probes (TaqMan® type assays), as discussedabove, flap endonuclease assays (Cleavase® and Invader® type assays),oligonucleotide hybridization arrays, cDNA hybridization arrays,oligonucleotide ligation assays, 3′ single nucleotide extension assaysand other assays designed to assess the abundance of specific mRNAsequences in a biological sample.

According to the method of the invention, total RNA is extracted fromthe source material and sufficiently purified to act as a substrate inan enzyme assay. The extraction procedure, including a new and improvedway of removing the wax (e.g. paraffin) used for embedding the tissuesamples, has been discussed above. It has also been noted that it ispreferred that oligo-dT based methods of selectively purifyingpolyadenylated mRNA not be used to isolate RNA for this invention sincethe bulk of the mRNA is expected to be fragmented, will not bepolyadenylated and, therefore, will not be recovered and available forsubsequent analytical assays if an oligo-dT based method is used.

A diagram of an improved method for repairing fragmented RNA is shown inFIG. 3. The fragmented RNA purified from the tissue sample is mixed withuniversal or gene-specific, single-stranded, DNA templates for each mRNAspecies of interest. These templates may be full length DNA copies ofthe mRNA derived from cloned gene sources, they may be fragments of thegene representing only the segment of the gene to be assayed, they maybe a series of long oligonucleotides representing either the full lengthgene or the specific segment(s) of interest. The template can representeither a single consensus sequence or be a mixture of polymorphicvariants of the gene. This DNA template, or scaffold, will preferablyinclude one or more dUTP or rNTP sites in its length. This will providea means of removing the template prior to carrying out subsequentanalytical steps to avoid its acting as a substrate or target in lateranalysis assays. This removal is accomplished by treating the samplewith uracil-DNA glycosylase (UDG) and heating it to cause strand breakswhere UDG has generated abasic sites. In the case of rNTP's, the samplecan be heated in the presence of a basic buffer (pH˜10) to induce strandbreaks where rNTP's are located in the template.

The single stranded DNA template is mixed with the purified RNA, themixture is denatured and annealed so that the RNA fragmentscomplementary to the DNA template effectively become primers that can beextended along the single stranded DNA templates. DNA polymerase Irequires a primer for extension but will efficiently use either a DNA oran RNA primer. Therefore in the presence of DNA polymerase I and dNTP's,the fragmented RNA can be extended along the complementary DNAtemplates. In order to increase the efficiency of the extension, thisreaction can be thermally cycled, allowing overlapping templates andextension products to hybridize and extend until the overall populationof fragmented RNA becomes represented as double stranded DNA extendedfrom RNA fragment primers.

Following the generation of this “repaired” RNA, the sample should betreated with UDG or heat-treated in a mildly based solution to fragmentthe DNA template (scaffold) and prevent it from participating insubsequent analytical reactions.

The product resulting from this enzyme extension can then be used as atemplate in a standard enzyme profiling assay that includesamplification and detectable signal generation such as fluorescent,chemiluminescent, colorimetric or other common read outs from enzymebased assays. For example, for TaqMan® type assays, this double strandedDNA product is added as the template in a standard assay; and, for arrayhybridization, this product acts as the cDNA template for the cRNAlabeling reaction typically used to generate single-stranded, labeledRNA for array hybridization.

This method of preparing template has the advantage of recoveringinformation from mRNA fragments too short to effectively act astemplates in standard cDNA generation schemes. In addition, this methodacts to preserve the specific locations in mRNA sequences targeted byspecific analysis assays. For example, TaqMan® assays rely on a singlecontiguous sequence in a cDNA copy of mRNA to act as a PCR amplificationtemplate targeted by a labeled reporter probe. If mRNA strand breaksoccur in this sequence, the assay will not detect that template and willunderestimate the quantity of that mRNA in the original sample. Thistarget preparation method minimizes that effect from RNA fragmentation.

The extension product formed in the RNA primer extension assay can becontrolled by controlling the input quantity of the single stranded DNAtemplate and by doing limited cycling of the extension reaction. This isimportant in preserving the relative abundance of the mRNA sequencestargeted for analysis.

This method has the added advantage of not requiring parallelpreparation for each target sequence since it is easily multiplexed. Itis also possible to use large pools of random sequence longoligonucleotides or full libraries of cloned sequences to extend theentire population of mRNA sequences in the sample extract for wholeexpressed genome analysis rather than targeted gene specific analysis.

10. Amplification of mRNA Species Prior to RT-PCR

Due to the limited amount and poor quality of mRNA that can be isolatedfrom FPET, a new procedure that could accurately amplify mRNAs ofinterest would be very useful, particularly for real time quantitationof gene expression (TaqMan®) and especially for quantitatively largenumber (>50) of genes>50 to 10,000.

Current protocols (e.g. Eberwine, Biotechniques 20:584-91 (1996)) areoptimized for mRNA amplification from small amount of total or poly A⁺RNA mainly for microarray analysis. The present invention provides aprotocol optimized for amplification of small amounts of fragmentedtotal RNA (average size about 60-150 bps), utilizing gene-specificsequences as primers, as illustrated in FIG. 4.

The amplification procedure of the invention uses a very large number,typically as many as 100-190,000 gene specific primers (GSP's) in onereverse transcription run. Each GSP contains an RNA polymerase promoter,e.g. a T7 DNA-dependent RNA polymerase promoter, at the 5′ end forsubsequent RNA amplification. GSP's are preferred as primers because ofthe small size of the RNA. Current protocols utilize dT primers, whichwould not adequately represent all reverse transcripts of mRNAs due tothe small size of the FPET RNA. GSP's can be designed by optimizingusual parameters, such as length, Tm, etc. For example, GSP's can bedesigned using the Primer Express® (Applied Biosystems), or Primer 3(MIT) software program. Typically at least 3 sets per gene are designed,and the ones giving the lowest Ct on FPET RNA (best performers) areselected.

Second strand cDNA synthesis is performed by standard procedures (seeFIG. 4, Method 1), or by GSP_(f) primers and Taq pol under PCRconditions (e.g., 95° C., 10 min (Taq activation) then 60° C., 45 sec).The advantages of the latter method are that the second gene specificprimer, SGF_(f) adds additional specificity (and potentially moreefficient second strand synthesis) and the option of performing severalcycles of PCR, if more starting DNA is necessary for RNA amplificationby T7 RNA polymerase. RNA amplification is then performed under standardconditions to generate multiple copies of cRNA, which is then used in astandard TaqMan® reaction.

Although this process is illustrated by using T7-based RNAamplification, a person skilled in the art will understand that otherRNA polymerase promoters that do not require a primer, such as T3 or Sp6can also be used, and are within the scope of the invention.

11. A Method of Elongation of Fragmented RNA and SubsequentAmplification

This method, which combines and modifies the inventions described insections 9 and 10 above, is illustrated in FIG. 5. The procedure beginswith elongation of fragmented mRNA. This occurs as described aboveexcept that the scaffold DNAs are tagged with the T7 RNA polymerasepromoter sequence at their 5′ ends, leading to double-stranded DNAextended from RNA fragments. The template sequences need to be removedafter in vitro transcription. These templates can include dUTP or rNTPnucleotides, enabling enzymatic removal of the templates as described insection 9, or the templates can be removed by DNaseI treatment.

The template DNA can be a population representing different mRNAs of anynumber. A high sequence complexity source of DNA templates (scaffolds)can be generated by pooling RNA from a variety of cells or tissues. Inone embodiment, these RNAs are converted into double stranded DNA andcloned into phagemids. Single stranded DNA can then be rescued byphagemid growth and single stranded DNA isolation from purifiedphagemids.

This invention is useful because it increases gene expression profilesignals two different ways: both by increasing test mRNA polynucleotidesequence length and by in vitro transcription amplification. Anadditional advantage is that it eliminates the need to carry out reversetranscription optimization with gene specific primers tagged with the T7RNA polymerase promoter sequence, and thus, is comparatively fast andeconomical.

This invention can be used with a variety of different methods toprofile gene expression, e.g., RT-PCR or a variety of DNA array methods.Just as in the previous protocol, this approach is illustrated by usinga T7 promoter but the invention is not so limited. A person skilled inthe art will appreciate, however, that other RNA polymerase promoters,such as T3 or Sp6 can also be used.

12. Breast Cancer Gene Set, Assayed Gene Subsequences, and ClinicalApplication of Gene Expression Data

An important aspect of the present invention is to use the measuredexpression of certain genes by breast cancer tissue to match patients tobest drugs or drug combinations, and to provide prognostic information.For this purpose it is necessary to correct for (normalize away) bothdifferences in the amount of RNA assayed and variability in the qualityof the RNA used. Therefore, the assay measures and incorporates theexpression of certain normalizing genes, including well knownhousekeeping genes, such as GAPDH and Cyp1. Alternatively, normalizationcan be based on the mean or median signal (Ct) of all of the assayedgenes or a large subset thereof (global normalization approach). On agene-by-gene basis, measured normalized amount of a patient tumor mRNAis compared to the amount found in a breast cancer tissue reference set.The number (N) of breast cancer tissues in this reference set should besufficiently high to ensure that different reference sets (as a whole)behave essentially the same way. If this condition is met, the identityof the individual breast cancer tissues present in a particular set willhave no significant impact on the relative amounts of the genes assayed.Usually, the breast cancer tissue reference set consists of at leastabout 30, preferably at least about 40 different FPE breast cancertissue specimens. Unless noted otherwise, normalized expression levelsfor each mRNA/tested tumor/patient will be expressed as a percentage ofthe expression level measured in the reference set. More specifically,the reference set of a sufficiently high number (e.g. 40) tumors yieldsa distribution of normalized levels of each mRNA species. The levelmeasured in a particular tumor sample to be analyzed falls at somepercentile within this range, which can be determined by methods wellknown in the art. Below, unless noted otherwise, reference to expressionlevels of a gene assume normalized expression relative to the referenceset although this is not always explicitly stated.

The breast cancer gene set is shown in Table 1. The gene AccessionNumbers, and the SEQ ID NOs for the forward primer, reverse primer andamplicon sequences that can be used for gene amplification, are listedin Table 2. The basis for inclusion of markers, as well as the clinicalsignificance of mRNA level variations with respect to the reference set,is indicated below. Genes are grouped into subsets based on the type ofclinical significance indicated by their expression levels: A.Prediction of patient response to drugs used in breast cancer treatment,or to drugs that are approved for other indications and could be usedoff-label in the treatment of breast cancer. B. Prognostic for survivalor recurrence of cancer.

C. Prediction of Patient Response to Therapeutic Drugs

1. Molecules that Specifically Influence Cellular Sensitivity to Drugs

Table 1 lists 74 genes (shown in italics) that specifically influencecellular sensitivity to potent drugs, which are also listed. Most of thedrugs shown are approved and already used to treat breast cancer (e.g.,anthracyclines; cyclophosphamide; methotrexate; 5-FU and analogues).Several of the drugs are used to treat breast cancer off-label or are inclinical development phase (e.g., bisphosphonates and anti-VEGF mAb).Several of the drugs have not been widely used to treat breast cancerbut are used in other cancers in which the indicated target is expressed(e.g., Celebrex is used to treat familial colon cancer; cisplatin isused to treat ovarian and other cancers.)

Patient response to 5 FU is indicated if normalized thymidylate synthasemRNA amount is at or below the 15^(th) percentile, or the sum ofexpression of thymidylate synthase plus dihydropyrimidine phosphorylaseis at or below the 25^(th) percentile, or the sum of expression of thesemRNAs plus thymidine phosphorylase is at or below the 20^(th)percentile. Patients with dihydropyrimidine dehydrogenase below 5^(th)percentile are at risk of adverse response to 5 FU, or analogs such asXeloda.

When levels of thymidylate synthase, and dihydropyrimidinedehydrogenase, are within the acceptable range as defined in thepreceding paragraph, amplification of c-myc mRNA in the upper 15%,against a background of wild-type p53 [as defined below] predicts abeneficial response to 5 FU (see D. Arango et al., Cancer Res.61:4910-4915 (2001)). In the presence of normal levels of thymidylatesynthase and dihydropyrimidine dehydrogenase, levels of NFκB and cIAP2in the upper 10% indicate resistance of breast tumors to thechemotherapeutic drug 5 FU.

Patient resistance to anthracyclines is indicated if the normalized mRNAlevel of topoisomerase IIα is below the 10^(th) percentile, or if thetopoisomerase IIβ normalized mRNA level is below the 10^(th) percentileor if the combined normalized topoisomerase IIα and β signals are belowthe 10^(th) percentile.

Patient sensitivity to methotrexate is compromised if DHFR levels aremore than tenfold higher than the average reference set level for thismRNA species, or if reduced folate carrier levels are below 10^(th)percentile.

Patients whose tumors express CYP1B1 in the upper 10%, have reducedlikelihood of responding to docetaxol.

The sum of signals for aldehyde dehydrogenase 1A1 and 1A3, when morethan tenfold higher than the reference set average, indicates reducedlikelihood of response to cyclophosphamide.

Currently, estrogen and progesterone receptor expression as measured byimmunohistochemistry is used to select patients for anti-estrogentherapy. We have demonstrated RT-PCR assays for estrogen andprogesterone receptor mRNA levels that predict levels of these proteinsas determined by a standard clinical diagnostic tests, with high degreeof concordance (FIGS. 6 and 7).

Patients whose tumors express ERα or PR mRNA in the upper 70%, arelikely to respond to tamoxifen or other anti-estrogens (thus,operationally, lower levels of ERα than this are to definedERα-negative). However, when the signal for microsomal epoxide hydrolaseis in the upper 10% or when mRNAs for pS2/trefoil factor, GATA3 or humanchorionic gonadotropin are at or below average levels found inERα-negative tumors, anti-estrogen therapy will not be beneficial.

Absence of XIST signal compromises the likelihood of response totaxanes, as does elevation of the GST-π or prolyl endopeptidase [PREP]signal in the upper 10%. Elevation of PLAG1 in the upper 10% decreasessensitivity to taxanes.

Expression of ERCC1 mRNA in the upper 10% indicate significant risk ofresistance to cisplatin or analogs.

An RT-PCR assay of Her2 mRNA expression predicts Her2 overexpression asmeasured by a standard diagnostic test, with high degree of concordance(data not shown). Patients whose tumors express Her2 (normalized tocyp. 1) in the upper 10% have increased likelihood of beneficialresponse to treatment with Herceptin or other ErbB2 antagonists.Measurement of expression of Grb7 mRNA serves as a test for HER2 geneamplification, because the Grb7 gene is closely linked to Her2. WhenHer2 is expression is high as defined above in this paragraph, similarlyelevated Grb7 indicates Her2 gene amplification. Overexpression of IGF1Rand or IGF1 or IGF2 decreases likelihood of beneficial response toHerceptin and also to EGFR antagonists.

Patients whose tumors express mutant Ha-Ras, and also express farnesylpyrophosphate synthetase or geranyl pyrophosphonate synthetase mRNAs atlevels above the tenth percentile comprise a group that is especiallylikely to exhibit a beneficial response to bis-phosphonate drugs.

Cox2 is a key control enzyme in the synthesis of prostaglandins. It isfrequently expressed at elevated levels in subsets of various types ofcarcinomas including carcinoma of the breast. Expression of this gene iscontrolled at the transcriptional level, so RT-PCR serves a validindicator of the cellular enzyme activity. Nonclinical research hasshown that cox2 promotes tumor angiogenesis, suggesting that this enzymeis a promising drug target in solid tumors. Several Cox2 antagonists aremarketed products for use in anti-inflammatory conditions. Treatment offamilial adenomatous polyposis patients with the cox2 inhibitor Celebrexsignificantly decreased the number and size of neoplastic polyps. Nocox2 inhibitor has yet been approved for treatment of breast cancer, butgenerally this class of drugs is safe and could be prescribed off-labelin breast cancers in which cox2 is over-expressed. Tumors expressingCOX2 at levels in the upper ten percentile have increased chance ofbeneficial response to Celebrex or other cyclooxygenase 2 inhibitors.

The tyrosine kinases ErbB1 [EGFR], ErbB3 [Her3] and ErbB4 [Her4]; alsothe ligands TGFalpha, amphiregulin, heparin-binding EGF-like growthfactor, and epiregulin; also BRK, a non-receptor kinase. Several drugsin clinical development block the EGF receptor. ErbB2-4, the indicatedligands, and BRK also increase the activity of the EGFR pathway. Breastcancer patients whose tumors express high levels of EGFR or EGFR andabnormally high levels of the other indicated activators of the EGFRpathway are potential candidates for treatment with an EGFR antagonist.

Patients whose tumors express less than 10% of the average level of EGFRmRNA observed in the reference panel are relatively less likely torespond to EGFR antagonists [such as Iressa, or ImClone 225]. In casesin which the EGFR is above this low range, the additional presence ofepiregulin, TGFα, amphiregulin, or ErbB3, or BRK, CD9, MMP9, or Lot1 atlevels above the 90^(th) percentile predisposes to response to EGFRantagonists. Epiregulin gene expression, in particular, is a goodsurrogate marker for EGFR activation, and can be used to not only topredict response to EGFR antagonists, but also to monitor response toEGFR antagonists [taking fine needle biopsies to provide tumor tissueduring treatment]. Levels of CD82 above the 90^(th) percentile suggestpoorer efficacy from EGFR antagonists.

The tyrosine kinases abl, c-kit, PDGFRalpha, PDGFbeta, and ARG; also,the signal transmitting ligands c-kit ligand, PDGFA, B, C and D. Thelisted tyrosine kinases are all targets of the drug Gleevec™ (imatinibmesylate, Novartis), and the listed ligands stimulate one or more of thelisted tyrosine kinases. In the two indications for which Gleevec™ isapproved, tyrosine kinase targets (bcr-abl and ckit) are overexpressedand also contain activating mutations. A finding that one of theGleevec™ target tyrosine kinase targets is expressed in breast cancertissue will prompt a second stage of analysis wherein the gene will besequenced to determine whether it is mutated. That a mutation found isan activating mutation can be proved by methods known in the art, suchas, for example, by measuring kinase enzyme activity or by measuringphosphorylation status of the particular kinase, relative to thecorresponding wild-type kinase. Breast cancer patients whose tumorsexpress high levels of mRNAs encoding Gleevec™ target tyrosine kinases,specifically, in the upper ten percentile, or mRNAs for Gleevec™ targettyrosine kinases in the average range and mRNAs for their cognate growthstimulating ligands in the upper ten percentile, are particularly goodcandidates for treatment with Gleevec™

VEGF is a potent and pathologically important angiogenic factor. (Seebelow under Prognostic Indicators.) When VEGF mRNA levels are in theupper ten percentile, aggressive treatment is warranted. Such levelsparticularly suggest the value of treatment with anti-angiogenic drugs,including VEGF antagonists, such as anti-VEGF antibodies. Additionally,KDR or CD31 mRNA level in the upper 20 percentile further increaseslikelihood of benefit from VEGF antagonists.

Farnesyl pyrophosphatase synthetase and geranyl geranyl pyrophosphatasesynthetase. These enzymes are targets of commercialized bisphosphonatedrugs, which were developed originally for treatment of osteoporosis butrecently have begun to prescribe them off-label in breast cancer.Elevated levels of mRNAs encoding these enzymes in breast cancer tissue,above the 90^(th) percentile, suggest use of bisphosphonates as atreatment option.

2. Multidrug Resistance Factors

These factors include 10 Genes: gamma glutamyl cysteine synthetase[GCS]; GST-α; GST-π; MDR-1; MRP1-4; breast cancer resistance protein[BCRP]; lung resistance protein [MVP]; SXR; YB-1.

GCS and both GST-α and GST-π regulate glutathione levels, which decreasecellular sensitivity to chemotherapeutic drugs and other toxins byreductive derivatization. Glutathione is a necessary cofactor formulti-drug resistant pumps, MDR-1 and the MRPs. MDR1 and MRPs functionto actively transport out of cells several important chemotherapeuticdrugs used in breast cancer.

GSTs, MDR-1, and MRP-1 have all been studied extensively to determinepossible have prognostic or predictive significance in human cancer.However, a great deal of disagreement exists in the literature withrespect to these questions. Recently, new members of the MRP family havebeen identified: MRP-2, MRP-3, MRP-4, BCRP, and lung resistance protein[major vault protein]. These have substrate specificities that overlapwith those of MDR-1 and MRP-1. The incorporation of all of theserelevant ABC family members as well as glutathione synthetic enzymesinto the present invention captures the contribution of this family todrug resistance, in a way that single or double analyte assays cannot.

MRP-I, the gene coding for the multidrug resistance protein.

P-glycoprotein, is not regulated primarily at the transcriptional level.However, p-glycoprotein stimulates the transcription of PTP1b. Anembodiment of the present invention is the use of the level of the mRNAfor the phosphatase PTP1b as a surrogate measure of MRP-1/p-glycoproteinactivity.

The gene SXR is also an activator of multidrug resistance, as itstimulates transcription of certain multidrug resistance factors.

The impact of multidrug resistance factors with respect tochemotherapeutic agents used in breast cancer is as follows. Beneficialresponse to doxorubicin is compromised when the mRNA levels of eitherMDR1, GSTα, GSTπ, SXR, BCRP YB-1, or LRP/MVP are in the upper fourpercentile. Beneficial response to methotrexate is inhibited if mRNAlevels of any of MRP1, MRP2, MRP3, or MRP4 or gamma-glutamyl cysteinesynthetase are in the upper four percentile.

3. Eukaryotic Translation Initiation Factor 4E [EIF4E]

EIF4E mRNA levels provides evidence of protein expression and so expandsthe capability of RT-PCR to indicate variation in gene expression. Thus,one claim of the present invention is the use of EIF4E as an addedindicator of gene expression of certain genes [e.g., cyclinD1, mdm2,VEGF, and others]. For example, in two tissue specimens containing thesame amount of normalized VEGF mRNA, it is likely that the tissuecontaining the higher normalized level of EIF4E exhibits the greaterlevel of VEGF gene expression.

The background is as follows. A key point in the regulation of mRNAtranslation is selection of mRNAs by the EIF4G complex to bind to the43S ribosomal subunit. The protein EIF4E [the m7G CAP-binding protein]is often limiting because more mRNAs than EIF4E copies exist in cells.Highly structured 5′UTRs or highly GC-rich ones are inefficientlytranslated, and these often code for genes that carry out functionsrelevant to cancer [e.g., cyclinD1, mdm2, and VEGF]. EIF4E is itselfregulated at the transcriptional/mRNA level. Thus, expression of EIF4Eprovides added indication of increased activity of a number of proteins.

It is also noteworthy that overexpression of EIF4E transforms culturedcells, and hence is an oncogene. Overexpression of EIF4E occurs inseveral different types of carcinomas but is particularly significant inbreast cancer. EIF4E is typically expressed at very low levels in normalbreast tissue.

D. Prognostic Indicators

1. DNA Repair Enzymes

Loss of BRCA1 or BRCA2 activity via mutation represents the criticaloncogenic step in the most common type[s] of familial breast cancer. Thelevels of mRNAs of these important enzymes are abnormal in subsets ofsporadic breast cancer as well. Loss of signals from either [to withinthe lower ten percentile] heightens risk of short survival.

2. Cell Cycle Regulators

Cell cycle regulators include 14 genes: c-MYC; c-Src; Cyclin D1; Ha-Ras;mdm2; p14ARF; p21WAF1/CIP; p16INK4a/p14; p23; p27; p53; PI3K;PKC-epsilon; PKC-delta.

The gene for p53 [TP53] is mutated in a large fraction of breastcancers. Frequently p53 levels are elevated when loss of functionmutation occurs. When the mutation is dominant-negative, it createssurvival value for the cancer cell because growth is promoted andapoptosis is inhibited. Thousands of different p53 mutations have beenfound in human cancer, and the functional consequences of many of themare not clear. A large body of academic literature addresses theprognostic and predictive significance of mutated p53 and the resultsare highly conflicting. The present invention provides a functionalgenomic measure of p53 activity, as follows. The activated wild type p53molecule triggers transcription of the cell cycle inhibitor p21. Thus,the ratio of p53 to p21 should be low when p53 is wild-type andactivated. When p53 is detectable and the ratio of p53 to p21 iselevated in tumors relative to normal breast, it signifies nonfunctionalor dominant negative p53. The cancer literature provides evidence forthis as born out by poor prognosis.

Mdm2 is an important p53 regulator. Activated wildtype p53 stimulatestranscription of mdm2. The mdm2 protein binds p53 and promotes itsproteolytic destruction. Thus, abnormally low levels of mdm2 in thepresence of normal or higher levels of p53 indicate that p53 is mutatedand inactivated.

One aspect of the present invention is the use of ratios of mRNAs levelsp53:p21 and p53:mdm2 to provide a picture of p53 status. Evidence fordominant negative mutation of p53 (as indicated by high p53:p21 and/orhigh p53:mdm2 mRNA ratios—specifically in the upper ten percentile)presages higher risk of recurrence in breast cancer and thereforeweights toward a decision to use chemotherapy in node negative postsurgery breast cancer.

Another important cell cycle regulator is p27, which in the activatedform blocks cell cycle progression at the level of cdk4. The protein isregulated primarily via phosphorylation/dephosphorylation, rather thanat the transcriptional level. However, levels of p27 mRNAs do vary.Therefore a level of p27 mRNA in the upper ten percentile indicatesreduced risk of recurrence of breast cancer post surgery.

Cyclin D1 is a principle positive regulator of entry into S phase of thecell cycle. The gene for cyclin D1 is amplified in about 20% of breastcancer patients, and therefore promotes tumor promotes tumor growth inthose cases. One aspect of the present invention is use of cyclin D1mRNA levels for diagnostic purposes in breast cancer. A level of cyclinD1 mRNA in the upper ten percentile suggests high risk of recurrence inbreast cancer following surgery and suggests particular benefit ofadjuvant chemotherapy.

3. Other Tumor Suppressors and Related Proteins

These include APC and E-cadherin. It has long been known that the tumorsuppressor APC is lost in about 50% of colon cancers, with concomitanttranscriptional upregulation of E-cadherin, an important cell adhesionmolecule and growth suppressor. Recently, it has been found that the APCgene silenced in 15-40% of breast cancers. Likewise, the E-cadherin geneis silenced [via CpG island methylation] in about 30% of breast cancers.An abnormally low level of APC and/or E-cadherin mRNA in the lower 5percentile suggests high risk of recurrence in breast cancer followingsurgery and heightened risk of shortened survival.

4. Regulators of Apoptosis

These include BC1/BAX family members BC12, Bcl-x1, Bak, Bax and relatedfactors, NFκ-B and related factors, and also p53BP1/ASPP1 andp53BP2/ASPP2.

Bax and Bak are pro-apoptotic and BC12 and Bcl-x1 are anti-apoptotic.Therefore, the ratios of these factors influence the resistance orsensitivity of a cell to toxic (pro-apoptotic) drugs. In breast cancer,unlike other cancers, elevated level of BC12 (in the upper tenpercentile) correlates with good outcome. This reflects the fact thatBC12 has growth inhibitory activity as well as anti-apoptotic activity,and in breast cancer the significance of the former activity outweighsthe significance of the latter. The impact of BC12 is in turn dependenton the status of the growth stimulating transcription factor c-MYC. Thegene for c-MYC is amplified in about 20% of breast cancers. When c-MYCmessage levels are abnormally elevated relative to BC12 (such that thisratio is in the upper ten percentile), then elevated level of BC12 mRNAis no longer a positive indicator.

NFκ-B is another important anti-apoptotic factor. Originally, recognizedas a pro-inflammatory transcription factor, it is now clear that itprevents programmed cell death in response to several extracellulartoxic factors [such as tumor necrosis factor]. The activity of thistranscription factor is regulated principally viaphosphorylation/dephosphorylation events. However, levels of NFκ-Bnevertheless do vary from cell to cell, and elevated levels shouldcorrelate with increased resistance to apoptosis. Importantly forpresent purposes, NFκ-B, exerts its anti-apoptotic activity largelythrough its stimulation of transcription of mRNAs encoding certainmembers of the IAP [inhibitor of apoptosis] family of proteins,specifically cIAP1, cIAP2, XIAP, and Survivin. Thus, abnormally elevatedlevels of mRNAs for these IAPs and for NFκ-B any in the upper 5percentile] signify activation of the NFκ-B anti-apoptotic pathway. Thissuggests high risk of recurrence in breast cancer following chemotherapyand therefore poor prognosis. One embodiment of the present invention isthe inclusion in the gene set of the above apoptotic regulators, and theabove-outlined use of combinations and ratios of the levels of theirmRNAs for prognosis in breast cancer.

The proteins p53BP1 and 2 bind to p53 and promote transcriptionalactivation of pro-apoptotic genes. The levels of p53BP1 and 2 aresuppressed in a significant fraction of breast cancers, correlating withpoor prognosis. When either is expressed in the lower tenth percentilepoor prognosis is indicated.

5. Factors that Control Cell Invasion and Angiogenesis

These include uPA, PAI1 cathepsinsB, G and L, scatter factor [HGF],c-met, KDR, VEGF, and CD31. The plasminogen activator uPA and its serpinregulator PAI1 promote breakdown of extracellular matrices and tumorcell invasion. Abnormally elevated levels of both mRNAs in malignantbreast tumors (in the upper twenty percentile) signify an increased riskof shortened survival, increased recurrence in breast cancer patientspost surgery, and increased importance of receiving adjuvantchemotherapy. On the other hand, node negative patients whose tumors donot express elevated levels of these mRNA species are less likely tohave recurrence of this cancer and could more seriously consider whetherthe benefits of standard chemotherapy justifies the associated toxicity.

Cathepsins B or L, when expressed in the upper ten percentile, predictpoor disease-free and overall survival. In particular, cathepsin Lpredicts short survival in node positive patients.

Scatter factor and its cognate receptor c-met promote cell motility andinvasion, cell growth, and angiogenesis. In breast cancer elevatedlevels of mRNAs encoding these factors should prompt aggressivetreatment with chemotherapeutic drugs, when expression of either, or thecombination, is above the 90^(th) percentile.

VEGF is a central positive regulator of angiogenesis, and elevatedlevels in solid tumors predict short survival [note many referencesshowing that elevated level of VEGF predicts short survival]. Inhibitorsof VEGF therefore slow the growth of solid tumors in animals and humans.VEGF activity is controlled at the level of transcription. VEGF mRNAlevels in the upper ten percentile indicate significantly worse thanaverage prognosis. Other markers of vascularization, CD31 [PECAM], andKDR indicate high vessel density in tumors and that the tumor will beparticularly malignant and aggressive, and hence that an aggressivetherapeutic strategy is warranted.

6. Markers for Immune and Inflammatory Cells and Processes

These markers include the genes for Immunoglobulin light chain λ, CD18,CD3, CD68, Fas [CD95], and Fas Ligand.

Several lines of evidence suggest that the mechanisms of action ofcertain drugs used in breast cancer entail activation of the hostimmune/inflammatory response (For example, Herceptin®). One aspect ofthe present invention is the inclusion in the gene set of markers forinflammatory and immune cells, and markers that predict tumor resistanceto immune surveillance. Immunoglobulin light chain lambda is a markerfor immunoglobulin producing cells. CD18 is a marker for all whitecells. CD3 is a marker for T-cells. CD68 is a marker for macrophages.

CD95 and Fas ligand are a receptor: ligand pair that mediate one of twomajor pathways by which cytotoxic T cells and NK cells kill targetedcells. Decreased expression of CD95 and increased expression of FasLigand indicates poor prognosis in breast cancer. Both CD95 and FasLigand are transmembrane proteins, and need to be membrane anchored totrigger cell death. Certain tumor cells produce a truncated solublevariant of CD95, created as a result of alternative splicing of the CD95mRNA. This blocks NK cell and cytotoxic T cell Fas Ligand-mediatedkilling of the tumors cells. Presence of soluble CD95 correlates withpoor survival in breast cancer. The gene set includes both soluble andfull-length variants of CD95.

7. Cell Proliferation Markers

The gene set includes the cell proliferation markers Ki67/MiB1, PCNA,Pin1, and thymidine kinase. High levels of expression of proliferationmarkers associate with high histologic grade, and short survival. Highlevels of thymidine kinase in the upper ten percentile suggest increased risk of short survival. Pin1 is a prolyl isomerase thatstimulates cell growth, in part through the transcriptional activationof the cyclin D1 gene, and levels in the upper ten percentile contributeto a negative prognostic profile.

8. Other Growth Factors and Receptors

This gene set includes IGF1, IGF2, IGFBP3, IGF1R, FGF2, FGFR1,CSF-1R/fms, CSF-1, IL6 and IL8. All of these proteins are expressed inbreast cancer. Most stimulate tumor growth. However, expression of thegrowth factor FGF2 correlates with good outcome. Some haveanti-apoptotic activity, prominently IGF1. Activation of the IGF1 axisvia elevated IGF1, IGF1R, or IGFBP3 (as indicated by the sum of thesesignals in the upper ten percentile) inhibits tumor cell death andstrongly contributes to a poor prognostic profile.

9. Gene Expression Markers that Define Subclasses of Breast Cancer

These include: GRO1 oncogene alpha, Grb7, cytokeratins 5 and 17, retinalbinding protein 4, hepatocyte nuclear factor 3, integrin alpha 7, andlipoprotein lipase. These markers subset breast cancer into differentcell types that are phenotypically different at the level of geneexpression. Tumors expressing signals for Bcl2, hepatocyte nuclearfactor 3, LIV1 and ER above the mean have the best prognosis for diseasefree and overall survival following surgical removal of the cancer.Another category of breast cancer tumor type, characterized by elevatedexpression of lipoprotein lipase, retinol binding protein 4, andintegrin α7, carry intermediate prognosis. Tumors expressing eitherelevated levels of cytokeratins 5, and 17, GRO oncogene at levelsfour-fold or greater above the mean, or ErbB2 and Grb7 at levelsten-fold or more above the mean, have worst prognosis.

Although throughout the present description, including the Examplesbelow, various aspects of the invention are explained with reference togene expression studies, the invention can be performed in a similarmanner, and similar results can be reached by applying proteomicstechniques that are well known in the art. The proteome is the totalityof the proteins present in a sample (e.g. tissue, organism, or cellculture) at a certain point of time. Proteomics includes, among otherthings, study of the global changes of protein expression in a sample(also referred to as “expression proteomics”). Proteomics typicallyincludes the following steps: (1) separation of individual proteins in asample by 2-D gel electrophoresis (2-D PAGE); (2) identification of theindividual proteins recovered from the gel, e.g. my mass spectrometryand/or N-terminal sequencing, and (3) analysis of the data usingbioinformatics. Proteomics methods are valuable supplements to othermethods of gene expression profiling, and can be used, alone or incombination with other methods of the present invention, to detect theproducts of the gene markers of the present invention.

Further details of the invention will be described in the followingnon-limiting Examples.

Example 1 Isolation of RNA from Formalin-Fixed, Paraffin-Embedded (FPET)Tissue Specimens

A. Protocols

I. EPICENTRE® Xylene Protocol

RNA Isolation

(1) Cut 1-6 sections (each 10 μm thick) of paraffin-embedded tissue persample using a clean microtome blade and place into a 1.5 ml eppendorftube.

(2) To extract paraffin, add 1 ml of xylene and invert the tubes for 10minutes by rocking on a nutator.

(3) Pellet the sections by centrifugation for 10 minutes at 14,000×g inan eppendorf microcentrifuge.

(4) Remove the xylene, leaving some in the bottom to avoid dislodgingthe pellet.

(5) Repeat steps 2-4.

(6) Add 1 ml of 100% ethanol and invert for 3 minutes by rocking on thenutator.

(7) Pellet the debris by centrifugation for 10 minutes at 14,000×g in aneppendorf microcentrifuge.

(8) Remove the ethanol, leaving some at the bottom to avoid the pellet.

(9) Repeat steps 6-8 twice.

(10) Remove all of the remaining ethanol.

(11) For each sample, add 2 μl of 50 μg/μl Proteinase K to 300 μl ofTissue and Cell Lysis Solution.

(12) Add 300 μl of Tissue and Cell Lysis Solution containing theProteinase K to each sample and mix thoroughly.

(13) Incubate at 65° C. for 90 minutes (vortex mixing every 5 minutes).Visually monitor the remaining tissue fragment. If still visible after30 minutes, add an additional 2 μl of 50 μg/μl Proteinase K and continueincubating at 65° C. until fragment dissolves.

(14) Place the samples on ice for 3-5 minutes and proceed with proteinremoval and total nucleic acid precipitation.

Protein Removal and Precipitation of Total Nucleic Acid

(1) Add 150 μl of MPC Protein Precipitation Reagent to each lysed sampleand vortex vigorously for 10 seconds.

(2) Pellet the debris by centrifugation for 10 minutes at 14,000×g in aneppendorf microcentrifuge.

(3) Transfer the supernatant into clean eppendorf tubes and discard thepellet.

(4) Add 500 μl of isopropanol to the recovered supernatant andthoroughly mix by rocking on the nutator for 3 minutes.

(5) Pellet the RNA/DNA by centrifugation at 4° C. for 10 minutes at14,000×g in an eppendorf microcentrifuge.

(6) Remove all of the isopropanol with a pipet, being careful not todislodge the pellet.

Removal of Contaminating DNA from RNA Preparations

(1) Prepare 200 μl of DNase I solution for each sample by adding 5 μl ofRNase-Free DNase I (1 U/μl) to 195 μl of 1×DNase Buffer.

(2) Completely resuspend the pelleted RNA in 200 μl of DNase I solutionby vortexing.

(3) Incubate the samples at 37° C. for 60 minutes.

(4) Add 200 μl of 2× T and C Lysis Solution to each sample and vortexfor 5 seconds.

(5) Add 200 μl of MPC Protein Precipitation Reagent, mix by vortexingfor 10 seconds and place on ice for 3-5 minutes.

(6) Pellet the debris by centrifugation for 10 minutes at 14,000×g in aneppendorf microcentrifuge.

(7) Transfer the supernatant containing the RNA to clean eppendorf tubesand discard the pellet. (Be careful to avoid transferring the pellet.)

(8) Add 500 μl of isopropanol to each supernatant and rock samples onthe nutator for 3 minutes.

(9) Pellet the RNA by centrifugation at 4° C. for 10 minutes at 14,000×gin an eppendorf microcentrifuge.

(10) Remove the isopropanol, leaving some at the bottom to avoiddislodging the pellet.

(11) Rinse twice with 1 ml of 75% ethanol. Centrifuge briefly if the RNApellet is dislodged.

(12) Remove ethanol carefully.

(13) Set under fume hood for about 3 minutes to remove residual ethanol.

(14) Resuspend the RNA in 30 μl of TE Buffer and store at −30° C.

II. Hot Wax/Urea Protocol of the Invention

RNA Isolation

(1) Cut 3 sections (each 10 μm thick) of paraffin-embedded tissue usinga clean microtome blade and place into a 1.5 ml eppendorf tube.

(2) Add 300 μl of lysis buffer (10 mM Tris 7.5, 0.5% sodium lauroylsarcosine, 0.1 mM EDTA pH 7.5, 4M Urea) containing 330 μg/ml ProteinaseK (added freshly from a 50 μg/μl stock solution) and vortex briefly.

(3) Incubate at 65° C. for 90 minutes (vortex mixing every 5 minutes).Visually monitor the tissue fragment. If still visible after 30 minutes,add an additional 2 μl of 50 μg/μl Proteinase K and continue incubatingat 65° C. until fragment dissolves.

(4) Centrifuge for 5 minutes at 14,000×g and transfer upper aqueousphase to new tube, being careful not to disrupt the paraffin seal.

(5) Place the samples on ice for 3-5 minutes and proceed with proteinremoval and total nucleic acid precipitation.

Protein Removal and Precipitation of Total Nucleic Acid

(1) Add 150 μl of 7.5M NH₄OAc to each lysed sample and vortex vigorouslyfor 10 seconds.

(2) Pellet the debris by centrifugation for 10 minutes at 14,000×g in aneppendorf microcentrifuge.

(3) Transfer the supernatant into clean eppendorf tubes and discard thepellet.

(4) Add 500 μl of isopropanol to the recovered supernatant andthoroughly mix by rocking on the nutator for 3 minutes.

(5) Pellet the RNA/DNA by centrifugation at 4° C. for 10 minutes at14,000×g in an eppendorf microcentrifuge.

(6) Remove all of the isopropanol with a pipet, being careful not todislodge the pellet.

Removal of Contaminating DNA from RNA Preparations

(1) Add 45 μl of 1×DNase I buffer (10 mM Tris-Cl, pH 7.5, 2.5 mM MgCl₂,0.1 mM CaCl₂) and 5 μl of RNase-Free DNase I (2 U/μl, Ambion) to eachsample.

(2) Incubate the samples at 37° C. for 60 minutes. Inactivate the DNaseIby heating at 70° C. for 5 minutes.

B. Results

Experimental evidence demonstrates that the hot RNA extraction protocolof the invention does not compromise RNA yield. Using 19 FPE breastcancer specimens, extracting RNA from three adjacent sections in thesame specimens, RNA yields were measured via capillary electrophoresiswith fluorescence detection (Agilent Bioanalyzer). Average RNA yields innanograms and standard deviations with the invented and commercialmethods, respectively, were: 139+/−21 versus 141+/−34.

Also, it was found that the urea-containing lysis buffer of the presentinvention can be substituted for the EPICENTRE® T&C lysis buffer, andthe 7.5 M NH₄OAc reagent used for protein precipitation in accordancewith the present invention can be substituted for the EPICENTRE® MPCprotein precipitation solution with neither significant compromise ofRNA yield nor TaqMan® efficiency.

Example 2 Amplification of mRNA Species Prior to RT-PCR

The method described in section 10 above was used with RNA isolated fromfixed, paraffin-embedded breast cancer tissue. TaqMan® analyses wereperformed with first strand cDNA generated with the T7-GSP primer(unamplified (T7-GSPr)), T7 amplified RNA (amplified (T7-GSPr)). RNA wasamplified according to step 2 of FIG. 4. As a control, TaqMan® was alsoperformed with cDNA generated with an unmodified GSPr (amplified(GSPr)). An equivalent amount of initial template (1 ng/well) was usedin each TaqMan® reaction.

The results are shown in FIG. 8. In vitro transcription increased RT-PCRsignal intensity by more than 10 fold, and for certain genes by morethan 100 fold relative to controls in which the RT-PCR primers were thesame primers used in method 2 for the generation of double-stranded DNAfor in vitro transcription (GSP-T7_(r) and GSP_(f)). Also shown in FIG.8 are RT-PCR data generated when standard optimized RT-PCR primers(i.e., lacking T7 tails) were used. As shown, compared to this control,the new method yielded substantial increases in RT-PCR signal (from 4 to64 fold in this experiment).

The new method requires that each T7-GSP sequence be optimized so thatthe increase in the RT-PCR signal is the same for each gene, relative tothe standard optimized RT-PCR (with non-T7 tailed primers).

Example 3 A Study of Gene Expression in Premalignant and MalignantBreast Tumors

A gene expression study was designed and conducted with the primary goalto molecularly characterize gene expression in paraffin-embedded, fixedtissue samples of invasive breast ductal carcinoma, and to explore thecorrelation between such molecular profiles and disease-free survival. Afurther objective of the study was to compare the molecular profiles intissue samples of invasive breast cancer with the molecular profilesobtained in ductal carcinoma in situ. The study was further designed toobtain data on the molecular profiles in lobular carcinoma in situ andin paraffin-embedded, fixed tissue samples of invasive lobularcarcinoma.

Molecular assays were performed on paraffin-embedded, formalin-fixedprimary breast tumor tissues obtained from 202 individual patientsdiagnosed with breast cancer. All patients underwent surgery withdiagnosis of invasive ductal carcinoma of the breast, pure ductalcarcinoma in situ (DCIS), lobular carcinoma of the breast, or purelobular carcinoma in situ (LCIS). Patients were included in the studyonly if histopathologic assessment, performed as described in theMaterials and Methods section, indicated adequate amounts of tumortissue and homogeneous pathology.

The individuals participating in the study were divided into thefollowing groups:

Group 1: Pure ductal carcinoma in situ (DCIS); n=18

Group 2: Invasive ductal carcinoma n=130

Group 3: Pure lobular carcinoma in situ (LCIS); n=7

Group 4: Invasive lobular carcinoma n=16

Materials and Methods

Each representative tumor block was characterized by standardhistopathology for diagnosis, semi-quantitative assessment of amount oftumor, and tumor grade. A total of 6 sections (10 microns in thicknesseach) were prepared and placed in two Costar Brand Microcentrifuge Tubes(Polypropylene, 1.7 mL tubes, clear; 3 sections in each tube). If thetumor constituted less than 30% of the total specimen area, the samplemay have been crudely dissected by the pathologist, using grossmicrodissection, putting the tumor tissue directly into the Costar tube.

If more than one tumor block was obtained as part of the surgicalprocedure, all tumor blocks were subjected to the same characterization,as described above, and the block most representative of the pathologywas used for analysis.

Gene Expression Analysis

mRNA was extracted and purified from fixed, paraffin-embedded tissuesamples, and prepared for gene expression analysis as described inchapters 7-11 above. Molecular assays of quantitative gene expressionwere performed by RT-PCR, using the ABI PRISM 7900™ Sequence DetectionSystem™ (Perkin-Elmer-Applied Biosystems, Foster City, Calif., USA). ABIPRISM 7900™ consists of a thermocycler, laser, charge-coupled device(CCD), camera and computer. The system amplifies samples in a 384-wellformat on a thermocycler. During amplification, laser-inducedfluorescent signal is collected in real-time through fiber optics cablesfor all 384 wells, and detected at the CCD. The system includes softwarefor running the instrument and for analyzing the data.

Analysis and Results

Tumor tissue was analyzed for 185 cancer-related genes and 7 referencegenes. The threshold cycle (CT) values for each patient were normalizedbased on the median of all genes for that particular patient. Clinicaloutcome data were available for all patients from a review of registrydata and selected patient charts. Outcomes were classified as:

0 died due to breast cancer or to unknown cause or alive with breastcancer recurrence;1 alive without breast cancer recurrence or died due to a cause otherthan breast cancer

Analysis was performed by:

1. Analysis of the relationship between normalized gene expression andthe binary outcomes of 0 or 1.2. Analysis of the relationship between normalized gene expression andthe time to outcome (0 or 1 as defined above) where patients who werealive without breast cancer recurrence or who died due to a cause otherthan breast cancer were censored. This approach was used to evaluate theprognostic impact of individual genes and also sets of multiple genes.Analysis of 147 Patients with Invasive Breast Carcinoma by BinaryApproach

In the first (binary) approach, analysis was performed on all 146patients with invasive breast carcinoma. At test was performed on thegroup of patients classified as 0 or 1 and the p-values for thedifferences between the groups for each gene were calculated.

The following Table 4 lists the 45 genes for which the p-value for thedifferences between the groups was <0.05

TABLE 4 Gene/ Mean CT Mean CT Degrees of SEQ ID NO: Alive Deceasedt-value freedom p FOXM1 33.66 32.52 3.92 144 0.0001 PRAME 35.45 33.843.71 144 0.0003 Bcl2 28.52 29.32 −3.53 144 0.0006 STK15 30.82 30.10 3.49144 0.0006 CEGP1 29.12 30.86 −3.39 144 0.0009 Ki-67 30.57 29.62 3.34 1440.0011 GSTM1 30.62 31.63 −3.27 144 0.0014 CA9 34.96 33.54 3.18 1440.0018 PR 29.56 31.22 −3.16 144 0.0019 BBC3 31.54 32.10 −3.10 144 0.0023NME1 27.31 26.68 3.04 144 0.0028 SURV 31.64 30.68 2.92 144 0.0041 GATA326.06 26.99 −2.91 144 0.0042 TFRC 28.96 28.48 2.87 144 0.0047 YB-1 26.7226.41 2.79 144 0.0060 DPYD 28.51 28.84 −2.67 144 0.0084 GSTM3 28.2129.03 −2.63 144 0.0095 RPS6KB1 31.18 30.61 2.61 144 0.0099 Src 27.9727.69 2.59 144 0.0105 Chk1 32.63 31.99 2.57 144 0.0113 ID1 28.73 29.13−2.48 144 0.0141 EstR1 24.22 25.40 −2.44 144 0.0160 p27 27.15 27.51−2.41 144 0.0174 CCNB1 31.63 30.87 2.40 144 0.0176 XIAP 30.27 30.51−2.40 144 0.0178 Chk2 31.48 31.11 2.39 144 0.0179 CDC25B 29.75 29.392.37 144 0.0193 IGF1R 28.85 29.44 −2.34 144 0.0209 AK055699 33.23 34.11−2.28 144 0.0242 PI3KC2A 31.07 31.42 −2.25 144 0.0257 TGFB3 28.42 28.85−2.25 144 0.0258 BAGI1 28.40 28.75 −2.24 144 0.0269 CYP3A4 35.70 35.322.17 144 0.0317 EpCAM 28.73 28.34 2.16 144 0.0321 VEGFC 32.28 31.82 2.16144 0.0326 pS2 28.96 30.60 −2.14 144 0.0341 hENT1 27.19 26.91 2.12 1440.0357 WISP1 31.20 31.64 −2.10 144 0.0377 HNF3A 27.89 28.64 −2.09 1440.0384 NFKBp65 33.22 33.80 −2.08 144 0.0396 BRCA2 33.06 32.62 2.08 1440.0397 EGFR 30.68 30.13 2.06 144 0.0414 TK1 32.27 31.72 2.02 144 0.0453VDR 30.08 29.73 1.99 144 0.0488

In the foregoing Table 4, lower (negative) t-values indicate higherexpression (or lower CTs), associated with better outcomes, and,inversely, higher (positive) t-values indicate higher expression (lowerCTs) associated with worse outcomes. Thus, for example, elevatedexpression of the FOXM1 gene (t-value=3.92, CT mean alive>CT meandeceased) indicates a reduced likelihood of disease free survival.Similarly, elevated expression of the CEGP1 gene (t-value=−3.39; CT meanalive<CT mean deceased) indicates an increased likelihood of diseasefree survival.

Based on the data set forth in Table 4, the overexpression of any of thefollowing genes in breast cancer indicates a reduced likelihood ofsurvival without cancer recurrence following surgery: FOXM1; PRAME;SKT15, Ki-67; CA9; NME1; SURV; TFRC; YB-1; RPS6 KB1; Src; Chk1; CCNB1;Chk2; CDC25B; CYP3A4; EpCAM; VEGFC; hENT1; BRCA2; EGFR; TK1; VDR.

Based on the data set forth in Table 4, the overexpression of any of thefollowing genes in breast cancer indicates a better prognosis forsurvival without cancer recurrence following surgery: Blc12; CEGP1;GSTM1; PR; BBC3; GATA3; DPYD; GSTM3; 101; EstR1; p27; XIAP; IGF1R;AK055699; P13KC2A; TGFB3; BAGI1; pS2; WISP1; HNF3A; NFKBp65.

Analysis of 108 ER Positive Patient by Binary Approach

108 patients with normalized CT for estrogen receptor (ER)<25.2 (i.e.,ER positive patients) were subjected to separate analysis. At test wasperformed on the groups of patients classified as 0 or 1 and thep-values for the differences between the groups for each gene werecalculated. The following Table 5 lists the 12 genes where the p-valuefor the differences between the groups was <0.05.

TABLE 5 Gene/ Mean CT Mean CT Degrees of SEQ ID NO: Alive Deceasedt-value freedom p PRAME 35.54 33.88 3.03 106 0.0031 Bcl2 28.24 28.87−2.70 106 0.0082 FOXM1 33.82 32.85 2.66 106 0.089 DIABLO 30.33 30.71−2.47 106 0.0153 EPHX1 28.62 28.03 2.44 106 0.0163 HIF1A 29.37 28.882.40 106 0.0180 VEGFC 32.39 31.69 2.39 106 0.0187 Ki-67 30.73 29.82 2.38106 0.0191 IGF1R 28.60 29.18 −2.37 106 0.0194 VDR 30.14 29.60 2.17 1060.0322 NME1 27.34 26.80 2.03 106 0.0452 GSTM3 28.08 28.92 −2.00 1060.0485

For each gene, a classification algorithm was utilized to identify thebest threshold value (CT) for using each gene alone in predictingclinical outcome.

Based on the data set forth in Table 5, overexpression of the followinggenes in ER-positive cancer is indicative of a reduced likelihood ofsurvival without cancer recurrence following surgery: PRAME; FOXM1;EPHX1; HIF1A; VEGFC; Ki-67; VDR; NME1. Some of these genes (PRAME;FOXM1; VEGFC; Ki-67; VDR; and NME1) were also identified as indicatorsof poor prognosis in the previous analysis, not limited to ER-positivebreast cancer. The overexpression of the remaining genes (EPHX1 andHIF1A) appears to be negative indicator of disease free survival inER-positive breast cancer only. Based on the data set forth in Table 5,overexpression of the following genes in ER-positive cancer isindicative of a better prognosis for survival without cancer recurrencefollowing surgery: Bcl-2; DIABLO; IGF1R; GSTM3. Of the latter genes,Bcl-2; IGFR1; and GSTM3 have also been identified as indicators of goodprognosis in the previous analysis, not limited to ER-positive breastcancer. The overexpression of DIABLO appears to be positive indicator ofdisease free survival in ER-positive breast cancer only.

Analysis of Multiple Genes and Indicators of Outcome

Two approaches were taken in order to determine whether using multiplegenes would provide better discrimination between outcomes.

First, a discrimination analysis was performed using a forward stepwiseapproach. Models were generated that classified outcome with greaterdiscrimination than was obtained with any single gene alone.

According to a second approach (time-to-event approach), for each gene aCox Proportional Hazards model (see, e.g. Cox, D. R., and Oakes, D.(1984), Analysis of Survival Data, Chapman and Hall, London, N.Y.) wasdefined with time to recurrence or death as the dependent variable, andthe expression level of the gene as the independent variable. The genesthat have a p-value<0.05 in the Cox model were identified. For eachgene, the Cox model provides the relative risk (RR) of recurrence ordeath for a unit change in the expression of the gene. One can choose topartition the patients into subgroups at any threshold value of themeasured expression (on the CT scale), where all patients withexpression values above the threshold have higher risk, and all patientswith expression values below the threshold have lower risk, or viceversa, depending on whether the gene is an indicator of good (RR>1.01)or poor (RR<1.01) prognosis. Thus, any threshold value will definesubgroups of patients with respectively increased or decreased risk. Theresults are summarized in the following Tables 6 and 7.

TABLE 6 Cox Model Results for 146 Patients with Invasive Breast CancerGene Relative Risk (RR) SE Relative Risk p value FOXM1 0.58 0.15 0.0002STK15 0.51 0.20 0.0006 PRAME 0.78 0.07 0.0007 Bcl2 1.66 0.15 0.0009CEGP1 1.25 0.07 0.0014 GSTM1 1.40 0.11 0.0014 Ki67 0.62 0.15 0.0016 PR1.23 0.07 0.0017 Contig51037 0.81 0.07 0.0022 NME1 0.64 0.15 0.0023 YB-10.39 0.32 0.0033 TFRC 0.53 0.21 0.0035 BBC3 1.72 0.19 0.0036 GATA3 1.320.10 0.0039 CA9 0.81 0.07 0.0049 SURV 0.69 0.13 0.0049 DPYD 2.58 0.340.0052 RPS6KB1 0.60 0.18 0.0055 GSTM3 1.36 0.12 0.0078 Src.2 0.39 0.360.0094 TGFB3 1.61 0.19 0.0109 CDC25B 0.54 0.25 0.0122 XIAP 3.20 0.470.0126 CCNB1 0.68 0.16 0.0151 IGF1R 1.42 0.15 0.0153 Chk1 0.68 0.160.0155 ID1 1.80 0.25 0.0164 p27 1.69 0.22 0.0168 Chk2 0.52 0.27 0.0175EstR1 1.17 0.07 0.0196 HNF3A 1.21 0.08 0.206 pS2 1.12 0.05 0.0230 BAGI11.88 0.29 0.0266 AK055699 1.24 0.10 0.0276 pENT1 0.51 0.31 0.0293 EpCAM0.62 0.22 0.0310 WISP1 1.39 0.16 0.0338 VEGFC 0.62 0.23 0.0364 TK1 0.730.15 0.0382 NFKBp65 1.32 0.14 0.0384 BRCA2 0.66 0.20 0.0404 CYP3A4 0.600.25 0.0417 EGFR 0.72 0.16 0.0436

TABLE 7 Cox Model Results for 108 Patients wih ER+ Invasive BreastCancer Gene Relative Risk (RR) SE Relative Risk p-value PRAME 0.75 0.100.0045 Contig51037 0.75 0.11 0.0060 Blc2 2.11 0.28 0.0075 HIF1A 0.420.34 0.0117 IGF1R 1.92 0.26 0.0117 FOXM1 0.54 0.24 0.0119 EPHX1 0.430.33 0.0120 Ki67 0.60 0.21 0.0160 CDC25B 0.41 0.38 0.0200 VEGFC 0.450.37 0.0288 CTSB 0.32 0.53 0.0328 DIABLO 2.91 0.50 0.0328 p27 1.83 0.280.0341 CDH1 0.57 0.27 0.0352 IGFBP3 0.45 0.40 0.0499

The binary and time-to-event analyses, with few exceptions, identifiedthe same genes as prognostic markers. For example, comparison of Tables4 and 6 shows that, with the exception of a single gene, the twoanalyses generated the same list of top 15 markers (as defined by thesmallest p values). Furthermore, when both analyses identified the samegene, they were concordant with respect to the direction (positive ornegative sign) of the correlation with survival/recurrence. Overall,these results strengthen the conclusion that the identified markers havesignificant prognostic value.

For Cox models comprising more than two genes (multivariate models),stepwise entry of each individual gene into the model is performed,where the first gene entered is pre-selected from among those geneshaving significant univariate p-values, and the gene selected for entryinto the model at each subsequent step is the gene that best improvesthe fit of the model to the data. This analysis can be performed withany total number of genes. In the analysis the results of which areshown below, stepwise entry was performed for up to 10 genes.

Multivariate analysis is performed using the following equation:

RR=exp[coef(geneA)×Ct(geneA)+coef(geneB)×Ct(geneB)+coef(geneC)×Ct(geneC)+. . . ].

In this equation, coefficients for genes that are predictors ofbeneficial outcome are positive numbers and coefficients for genes thatare predictors of unfavorable outcome are negative numbers. The “Ct”values in the equation are ΔCts, i.e. reflect the difference between theaverage normalized Ct value for a population and the normalized Ctmeasured for the patient in question. The convention used in the presentanalysis has been that ΔCts below and above the population average havepositive signs and negative signs, respectively (reflecting greater orlesser mRNA abundance). The relative risk (RR) calculated by solvingthis equation will indicate if the patient has an enhanced or reducedchance of long-term survival without cancer recurrence.

Multivariate Gene Analysis of 147 Patients with Invasive BreastCarcinoma

(a) A multivariate stepwise analysis, using the Cox Proportional HazardsModel, was performed on the gene expression data obtained for all 147patients with invasive breast carcinoma. Genes CEGP1, FOXM1, STK15 andPRAME were excluded from this analysis. The following ten-gene sets havebeen identified by this analysis as having particularly strongpredictive value of patient survival without cancer recurrence followingsurgical removal of primary tumor.

-   -   1. Bcl2, cyclinG1, NFKBp65, NME1, EPHX1, TOP2B, DR5, TERC, Src,        DIABLO;    -   2. Ki67, XIAP, hENT1, TS, CD9, p27, cyclinG1, pS2, NFKBp65,        CYP3A4;    -   3. GSTM1, XIAP, Ki67, TS, cyclinG1, p27, CYP3A4, pS2, NFKBp65,        ErbB3;    -   4. PR, NME1, XIAP, upa, cyclinG1, Contig51037, TERC, EPHX1,        ALDH1A3, CTSL;    -   5. CA9, NME1, TERC, cyclinG1, EPHX1, DPYD, Src, TOP2B, NFKBp65,        VEGFC;    -   6. TFRC, XIAP, Ki67, TS, cyclinG1, p27, CYP3A4, pS2, ErbB3,        NFKBp65.

(b) A multivariate stepwise analysis, using the Cox, ProportionalHazards Model, was performed on the gene expression data obtained forall 147 patients with invasive breast carcinoma, using an interrogationset including a reduced number of genes. The following ten-gene setshave been identified by this analysis as having particularly strongpredictive value of patient survival without cancer recurrence followingsurgical removal of primary tumor.

-   -   1. Bcl2, PRAME, cyclinG1, FOXM1, NFKBp65, TS, XIAP, Ki67,        CYP3A4, p27;    -   2. FOXM1, cyclinG1, XIAP, Contig51037, PRAME, TS, Ki67, PDGFRa,        p27, NFKBp65;    -   3. PRAME, FOXM1, cyclinG1, XIAP, Contig51037, TS, Ki6, PDGFRa,        p27, NFKBp65;    -   4. Ki67, XIAP, PRAME, hENT1, contig51037, TS, CD9, p27, ErbB3,        cyclinG1;    -   5. STK15, XIAP, PRAME, PLAUR, p27, CTSL, CD18, PREP, p53, RPS6        KB1;    -   6. GSTM1, XIAP, PRAME, p27, Contig51037, ErbB3, GSTp, EREG, ID1,        PLAUR;    -   7. PR, PRAME, NME1, XIAP, PLAUR, cyclinG1, Contig51037, TERC,        EPHX1, DR5;    -   8. CA9, FOXM1, cyclinG1, XIAP, TS, Ki67, NFKBp65, CYP3A4, GSTM3,        p27;    -   9. TFRC, XIAP, PRAME, p27, Contig51037, ErbB3, DPYD, TERC, NME1,        VEGFC;    -   10. CEGP1, PRAME, hENT1, XIAP, Contig51037, ErbB3, DPYD,        NFKBp65, ID1, TS.

Multivariate Analysis of Patients with ER Positive Invasive BreastCarcinoma

A multivariate stepwise analysis, using the Cox Proportional HazardsModel, was performed on the gene expression data obtained for patientswith ER positive invasive breast carcinoma. The following ten-gene setshave been identified by this analysis as having particularly strongpredictive value of patient survival without cancer recurrence followingsurgical removal of primary tumor.

-   -   1. PRAME, p27, IGFBP2, HIF1A, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   2. Contig51037, EPHX1, Ki67, TIMP2, cyclinG1, DPYD, CYP3A4, TP,        AIB1, CYP2C8;    -   3. Bcl2, hENT1, FOXM1, Contig51037, cyclinG1, Contig46653, PTEN,        CYP3A4, TIMP2, AREG;    -   4. HIF1A, PRAME, p27, IGFBP2, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   5. IGF1R, PRAME, EPHX1, Contig51037, cyclinG1, Bcl2, NME1, PTEN,        TBP, TIMP2;    -   6. FOXM1, Contig51037, VEGFC, TBP, HIF1A, DPYD, RAD51C, DCR3,        cyclinG1, BAG1;

7. EPHX1, Contig51037, Ki67, TIMP2, cyclinG1, DPYD, CYP3A4, TP, AIB1,CYP2C8;

-   -   8. Ki67, VEGFC, VDR, GSTM3, p27, upa, ITGA7, rhoC, TERC, Pin1;    -   9. CDC25B, Contig51037, hENT1, Bcl2, HLAG, TERC, NME1, upa, ID1,        CYP;    -   10. VEGFC, Ki67, VDR, GSTM3, p27, upa, ITGA7, rhoC, TERC, Pin1;    -   11. CTSB, PRAME, p27, IGFBP2, EPHX1, CTSL, BAD, DR5, DCR3, XIAP;    -   12. DIABLO, Ki67, hENT1, TIMP2, ILT2, p27, KRT19, IGFBP2, TS,        PDGFB;    -   13. p27, PRAME, IGFBP2, HIF1A, T1MP2, ILT2, CYP3A4, ID1, EstR1,        DIABLO;    -   14. CDH1; PRAME, VEGFC; HIF1A; DPYD, TIMP2, CYP3A4, EstR1, RBP4,        p27;    -   15. IGFBP3, PRAME, p27, Bcl2, XIAP, EstR1, Ki67, TS, Src, VEGF;    -   16. GSTM3, PRAME, p27, IGFBP3, XIAP, FGF2, hENT1, PTEN, EstR1,        APC;    -   17. hENT1, Bcl2, FOXM1, Contig51037, CyclinG1, Contig46653,        PTEN, CYP3A4, TIMP2, AREG;    -   18. STK15, VEGFC, PRAME, p2′7, GCLC, hENT1, ID1, TIMP2, EstR1,        MCP1;    -   19. NME1, PRAM, p27, IGFBP3, XIAP, PTEN, hENT1, Bcl2, CYP3A4,        HLAG;    -   20. VDR, Bcl2, p27, hENT1, p53, PI3KC2A, EIF4E, TFRC, MCM3, ID1;    -   21. EIF4E, Contig51037, EPHX1, cyclinG1, Bcl2, DR5, TBP, PTEN,        NME1, HER2;    -   22. CCNB1, PRAME, VEGFC, HIF1A, hENT1, GCLC, TIMP2, ID1, p27,        upa;    -   23. ID1, PRAME, DIABLO, hENT1, p27, PDGFRa, NME1, BIN1, BRCA1,        TP;    -   24. FBXO5, PRAME, IGFBP3, p27, GSTM3, hENT1, XIAP, FGF2, TS,        PTEN;    -   25. GUS, HIA1A, VEGFC, GSTM3, DPYD, hENT1, FBXO5, CA9, CYP,        KRT18;    -   26. Bclx, Bcl2, hENT1, Contig51037, HLAG, CD9, ID1, BRCA1, BIN1,        HBEGF.

It is noteworthy that many of the foregoing gene sets include genes thatalone did not have sufficient predictive value to qualify as prognosticmarkers under the standards discussed above, but in combination withother genes, their presence provides valuable information about thelikelihood of long-term patient survival without cancer recurrence

All references cited throughout the disclosure are hereby expresslyincorporated by reference.

While the present invention has been described with reference to whatare considered to be the specific embodiments, it is to be understoodthat the invention is not limited to such embodiments. To the contrary,the invention is intended to cover various modifications and equivalentsincluded within the spirit and scope of the appended claims. Forexample, while the disclosure focuses on the identification of variousbreast cancer associated genes and gene sets, and on the diagnosis andtreatment of breast cancer, similar genes, gene sets and methodsconcerning other types of cancer are specifically within the scopeherein.

TABLE 1 1. ADD3 (adducin 3 gamma)* 2. AKT1/Protein Kinase B 3. AKT 2 4.AKT 3 5. Aldehyde dehydrogenase 1A1 6. Aldehyde dehydrogenase 1A3 7.amphiregulin 8. APC 9. ARG 10. ATM 11. Bak 12. Bax 13. Bcl2 14. Bcl-xl15. BRK 16. BCRP 17. BRCA-1 18. BRCA-2 19. Caspase-3 20. Cathepsin B 21.Cathepsin G 22. Cathepsin L 23. CD3 24. CD9 25. CD18 26. CD31 27.CD44{circumflex over ( )} 28. CD68 29. CD82/KAI-1 30. Cdc25A 31. Cdc25B32. CGA 33. COX2 34. CSF-1 35. CSF-1R/fms 36. cIAP1 37. cIAP2 38. c-abl39. c-kit 40. c-kit L 41. c-met 42. c-myc 43. cN-1 44. cryptochrome1*45. c-Src 46. Cyclin D1 47. CYP1B1 48. CYP2C9* 49. Cytokeratin5{circumflex over ( )} 50. Cytokeratin 17{circumflex over ( )} 51.Cytokeratin 18{circumflex over ( )} 52. DAP-Kinase-1 53. DHFR 54. DIABLO55. Dihydropyrimidine dehydrogenase 56. EGF 57.ECadherin/CDH1{circumflex over ( )} 58. ELF 3* 59. Endothelin 60.Epiregulin 61. ER-alpha{circumflex over ( )} 62. ErbB-1 63.ErbB-2{circumflex over ( )} 64. ErbB-3 65. ErbB-4 66. ER-Beta 67.Eukaryotic Translation Initiation Factor 4B*(EIF4B) 68. E1F4E 69.farnesyl pyrolophosphate synthetase 70. FAS (CD95) 71. FasL 72. FGF R 1*73. FGF2 [bFGF] 74. 53BP1 75. 53BP2 76. GALC (galactosylceramidase)* 77.Gamma-GCS (glutamyl cysteine synthetase) 78. GATA3{circumflex over ( )}79. geranyl geranyl pyrophosphate synthetase 80. G-CSF 81. GPC3 82.gravin* [AK AP258] 83. GRO1 oncogene alpha{circumflex over ( )} 84.Grb7{circumflex over ( )} 85. GST-alpha 86. GST-pi{circumflex over ( )}87. Ha-Ras 88. HB-EGF 89. HE4-extracellular Proteinase InhibitorHomologue* 90. hepatocyte nuclear factor 3{circumflex over ( )} 91.HER-2 92. HGF/Scatter factor 93. hIAP1 94. hIAP2 95. HIF-1 96. humankallikrein 10 97. MLH1 98. hsp 27 99. human chorionic gonadotropin/CGA100. Human Extracellular Protein S1-5 101. Id-1 102. Id-2 103. Id-3 104.IGF-1 105. IGF2 106. IGF1R 107. IGFBP3 108. interstitial integrin alpha7 109. IL6 110. IL8 111. IRF-2* 112. IRF9 Protein 113. Kalikrein 5 114.Kalikrein 6 115. KDR 116. Ki-67/MiB1 117. lipoprotein lipase{circumflexover ( )} 118. LIV1 119. Lung Resistance Protein/MVP 120. Lot1 121.Maspin 122. MCM2 123. MCM3 124. MCM7 125. MCP-1 126.microtubule-associated protein 4 127. MCJ 128. mdm2 129. MDR-1 130.microsomal epoxide hydrolase 131. MMP9 132. MRP1 133. MRP2 134. MRP3135. MRP4 136. MSN (Moesin)* 137. mTOR 138. Muc1/CA 15-3 139. NF-kB 140.P14ARF 141. P16INK4a/p14 142. p21wAF1/CIP1 143. p23 144. p27 145. p311*146. p53 147. PAI1 148. PCNA 149. PDGF-A 150. PDGF-B 151. PDGF-C 152.PDGF-D 153. PDGFR-α 154. PDGFR-β 155. PI3K 156. Pin1 157. PKC-ε 158.Pkc-δ 159. PLAG1 (pleiomorphic aden

1)* 160. PREP prolyl endopeptidase

161. Progesterone receptor 162. pS2/trefoil factor 1 163. PTEN 164.PTP1b 165. RAR-alpha 166. RAR-beta2 167. RCP 168. Reduced Folate Carrier169. Retinol binding protein 4{circumflex over ( )} 170. STK15/BTAK 171.Survivin 172. SXR 173. Syk 174. TGD (thymine-DNA glycosylase)* 175.TGFalpha 176. Thymidine Kinase 177. Thymidine phosphorylase 178.Thymidylate Synthase 179. Topoisomerase II-α 180. Topoisomerase II-β181. TRAMP 182. UPA 183. VEGF 184. Vimentin 185. WTH3 186. XAF1 187.XIAP 188. XIST 189. XPA 190. YB-1 *NCI 60 drug Sens./Resist Marker{circumflex over ( )}In Cluster Defining tumor subclass Jan. 19, 2002

indicates data missing or illegible when filed

TABLE 2 Forward Reverse Primer Primer Amplicon Gene Accession No. SEQ IDNO. SEQ ID NO. SEQ ID NO. ABCB1 NM_000927 1 2 3 ABCC1 NM_004996 4 5 6ABCC2 NM_000392 7 8 9 ABCC3 NM_003786 10 11 12 ABCC4 NM_005845 13 14 15ABL1 NM_005157 16 17 18 ABL2 NM_005158 19 20 21 ACTB NM_001101 22 23 24AKT1 NM_005163 25 26 27 AKT3 NM_005465 28 29 30 ALDH1 NM_000689 31 32 33ALDH1A3 NM_000693 34 35 36 APC NM_000038 37 38 39 AREG NM_001657 40 4142 B2M NM_004048 43 44 45 BAK1 NM_001188 46 47 48 BAX NM_004324 49 50 51BCL2 NM_000633 52 53 54 BCL2L1 NM_001191 55 56 57 BIRC3 NM_001165 58 5960 BIRC4 NM_001167 61 62 63 BIRC5 NM_001168 64 65 66 BRCA1 NM_007295 6768 69 BRCA2 NM_000059 70 71 72 CCND1 NM_001758 73 74 75 CD3Z NM_00073476 77 78 CD68 NM_001251 79 80 81 CDC25A NM_001789 82 83 84 CDH1NM_004360 85 86 87 CDKN1A NM_000389 88 89 90 CDKN1B NM_004064 91 92 93CDKN2A NM_000077 94 95 96 CYP1B1 NM_000104 97 98 99 DHFR NM_000791 100101 102 DPYD NM_000110 103 104 105 ECGF1 NM_001953 106 107 108 EGFRNM_005228 109 110 111 EIF4E NM_001968 112 113 114 ERBB2 NM_004448 115116 117 ERBB3 NM_001982 118 119 120 ESR1 NM_000125 121 122 123 ESR2NM_001437 124 125 126 GAPD NM_002046 127 128 129 GATA3 NM_002051 130 131132 GRB7 NM_005310 133 134 135 GRO1 NM_001511 136 137 138 GSTP1NM_000852 139 140 141 GUSB NM_000181 142 143 144 hHGF M29145 145 146 147HNF3A NM_004496 148 149 150 ID2 NM_002166 151 152 153 IGF1 NM_000618 154155 156 IGFBP3 NM_000598 157 158 159 ITGA7 NM_002206 160 161 162 ITGB2NM_000211 163 164 165 KDR NM_002253 166 167 168 KIT NM_000222 169 170171 KITLG NM_000899 172 173 174 KRT17 NM_000422 175 176 177 KRT5NM_000424 178 179 180 LPL NM_000237 181 182 183 MET NM_000245 184 185186 MKI67 NM_002417 187 188 189 MVP NM_017458 190 191 192 MYC NM_002467193 194 195 PDGFA NM_002607 196 197 198 PDGFB NM_002608 199 200 201PDGFC NM_016205 202 203 204 PDGFRA NM_006206 205 206 207 PDGFRBNM_002609 208 209 210 PGK1 NM_000291 211 212 213 PGR NM_000926 214 215216 PIN1 NM_006221 217 218 219 PLAU NM_002658 220 221 222 PPIH NM_006347223 224 225 PTEN NM_000314 226 227 228 PTGS2 NM_000963 229 230 231 RBP4NM_006744 232 233 234 RELA NM_021975 235 236 237 RPL19 NM_000981 238 239240 RPLP0 NM_001002 241 242 243 SCDGF-B NM_025208 244 245 246 SERPINE1NM_000602 247 248 249 SLC19A1 NM_003056 250 251 252 TBP NM_003194 253254 255 TFF1 NM_003225 256 257 258 TFRC NM_003234 259 260 261 TK1NM_003258 262 263 264 TNFRSF6 NM_000043 265 266 267 TNFSF6 NM_000639 268269 270 TOP2A NM_001067 271 272 273 TOP2B NM_001068 274 275 276 TP53NM_000546 277 278 279 TYMS NM_001071 280 281 282 VEGF NM_003376 283 284285

TABLE 3 GENE ACCESSION NO. SEQ ID NO: AK055699 AK055699 286 BAG1NM_004323 287 BBC3 NM_014417 288 Bcl2 NM_000633 289 BRCA2 NM_000059 290CA9 NM_001216 291 CCNB1 NM_031966 292 CDC25B NM_021874 293 CEGP1NM_020974 294 Chk1 NM_001274 295 Chk2 NM_007194 296 CYP3A4 NM_017460 297DIABLO NM_019887 298 DPYD NM_000110 299 EGFR NM_005228 300 EpCAMNM_002354 301 EPHX1 NM_000120 302 EstR1 NM_000125 303 FOXM1 NM_021953304 GATA3 NM_002051 305 GSTM1 NM_000561 306 GSTM3 NM_000849 307 hENT1NM_004955 308 HIF1A NM_001530 309 HNF3A NM_004496 310 ID1 NM_002165 311IGF1R NM_000875 312 Ki-67 NM_002417 313 NFKBp65 NM_021975 314 NME1NM_000269 315 p27 NM_004064 316 PI3KC2A NM_002645 317 PR NM_000926 318PRAME NM_006115 319 pS2 NM_003225 320 RPS6KB1 NM_003161 321 SrcNM_004383 322 STK15 NM_003600 323 SURV NM_001168 324 TFRC NM_003234 325TGFB3 NM_003239 326 TK1 NM_003258 327 VDR NM_000376 328 VEGFC NM_005429329 WISP1 NM_003882 330 XIAP NM_001167 331 YB-1 NM_004559 332 ITGA7NM_002206 333 PDGFB NM_002608 334 Upa NM_002658 335 TBP NM_003194 336PDGFRa NM_006206 337 Pin1 NM_006221 338 CYP NM_006347 339 RBP4 NM_006744340 BRCA1 NM_007295 341 APC NM_000038 342 GUS NM_000181 343 CD18NM_000211 344 PTEN NM_000314 345 P53 NM_000546 346 ALDH1A3 NM_000693 347GSTp NM_000852 348 TOP2B NM_001068 349 TS NM_001071 350 Bclx NM_001191351 AREG NM_001657 352 TP NM_001953 353 EIF4E NM_001968 354 ErbB3NM_001982 355 EREG NM_001432 356 GCLC NM_001498 357 CD9 NM_001769 358HB-EGF NM_001945 359 IGFBP2 NM_000597 360 CTSL NM_001912 361 PREPNM_002726 362 CYP3A4 NM_017460 363 ILT-2 NM_006669 364 MCM3 NM_002388365 KRT19 NM_002276 366 KRT18 NM_000224 367 TIMP2 NM_003255 368 BADNM_004322 369 CYP2C8 NM_030878 370 DCR3 NM_016434 371 PLAUR NM_002659372 PI3KC2A NM_002645 373 FGF2 NM_002006 374 HLA-G NM_002127 375 AIB1NM_006534 376 MCP1 NM_002982 377 Contig46653 Contig46653 378 RhoCNM_005167 379 DR5 NM_003842 380 RAD51C NM_058216 381 BIN1 NM_004305 382VDR NM_000376 383 TERC U86046 384

1.-45. (canceled)
 46. A method comprising: assaying a level of a RNAtranscript of CEGP1 in a tissue sample obtained from a primary ductal orlobular breast tumor of a human patient; normalizing said level againsta level of at least one reference RNA transcript in said tissue sampleto provide a normalized CEGP1 expression level; and predicting thelikelihood of long-term survival of said patient without recurrence ofbreast cancer by comparing said normalized CEGP1 expression level toCEGP1 expression data obtained from reference breast cancer samples,wherein an increased normalized CEGP1 expression level is positivelycorrelated with an increased likelihood of long-term survival withoutbreast cancer recurrence in said patients.
 47. The method of claim 46further comprising assaying a level of a RNA transcript of one or moregenes selected from the group consisting of: STK15, Ki-67, PR, GSTM3,ESR1, HNF3A, BIRC5, BAG1, BCL2, CCNB1, and GSTM1 in said tissue sample;normalizing the level of the RNA transcript of the one or more genesagainst a level of at least one reference RNA transcript in said tissuesample to provide a normalized level of said one or more genes; andcomparing said normalized level of said one or more genes to geneexpression data from said one or more genes obtained from referencebreast cancer samples, wherein increased expression of one or more ofBIRC5, CCNB1, STK15 and Ki-67, negatively correlates with an increasedlikelihood of long-term survival without breast cancer recurrence, andincreased expression of one or more of BAG1, BCL2, PR, GSTM1, GSTM3,ESR1 and HNF3A positively correlates with an increased likelihood oflong-term survival without breast cancer recurrence.
 48. The method ofclaim 46 wherein the breast tumor is an invasive breast tumor, and saidmethod further comprises assaying a level of a RNA transcript of one ormore genes selected from the group consisting of: FOXM1, PRAME, BCL2,STK15, Ki-67, PR, BBC3, NME1, BIRC5, GATA3, TFRC, YB-1, DPYD, CA9,Contig51037, RPS6K1 and Her2 in said tissue sample.
 49. The method ofclaim 46 wherein said breast tumor is estrogen receptor (ER) positivebreast tumor.
 50. The method of claim 49 further comprising assaying alevel of a RNA transcript of one or more genes selected from the groupconsisting of: PRAME, BCL2, FOXM1, DIABLO, EPHX1, HIF1A, VEGFC, Ki-67,IGF1R, VDR, NME1, GSTM3, Contig51037, CDC25B, CTSB, p27, CDH1, andIGFBP3 in said tissue sample.
 51. The method of claim 47 wherein thelevels of 2 or more RNA transcripts are assayed.
 52. The method of claim46, wherein said tissue sample is a fixed, wax-embedded breast cancertissue specimen of said patient.
 53. The method of claim 46, whereinsaid tissue sample is from a fine needle biopsy.
 54. The method of claim46, further comprising creating a report based upon the normalized CEGP1expression level.
 55. The method of claim 54, wherein said reportincludes a prediction of the likelihood of long term survival of saidpatient without the recurrence of breast cancer.
 56. The method of claim55, wherein said report comprises information concerning arecommendation for a treatment modality of said patient.
 57. The methodof claim 46, wherein said gene expression data is produced using amultivariate analysis using the Cox Proportional Hazards model.
 58. Themethod of claim 46 wherein said assaying is done by reversetranscriptase polymerase chain reaction (RT-PCR).
 59. The method ofclaim 46, wherein said assaying is done after a primary ductal carcinomahas been surgically removed from a breast of said patient.
 60. Themethod of claim 59, wherein said primary ductal carcinoma is an invasiveductal carcinoma.
 61. The method of claim 46, wherein said assaying isdone after a primary lobular carcinoma has been surgically removed froma breast of said patient.
 62. The method of claim 61, wherein saidprimary lobular carcinoma is an invasive lobular carcinoma.